drin commented on issue #38837: URL: https://github.com/apache/arrow/issues/38837#issuecomment-2129531288
> The code at this link is not really related to cardinality estimation Ah, I misunderstood then. > In DuckDB, these statistics are created as a callback function that exists in the scanner this is why I figured either approach to getting statistics (schema or API call) is viable, since the callback should be able to accommodate either. > Could you explain more about a separate API call I just meant a function call of the scanner API (or I guess something like ADBC, but I don't know that API at all). I interpreted weston's comment to mean that a function call to get statistics is comparable to a function call to get a record batch. But when I say it provides extra flexibility, I mean that it provides a standard way for an independent producer to specify logic to a consumer for how to get statistics. This could allow for things like storing statistics independently from the data stream itself. I also feel like this doesn't preclude statistics metadata being packed into the schema (maybe in some application-specific way). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
