Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

via GitHub Fri, 24 May 2024 06:24:06 -0700


drin commented on issue #38837:
URL: https://github.com/apache/arrow/issues/38837#issuecomment-2129531288


   > The code at this link is not really related to cardinality estimation
   
   Ah, I misunderstood then.
   
   > In DuckDB, these statistics are created as a callback function that exists 
in the scanner
   
   this is why I figured either approach to getting statistics (schema or API 
call) is viable, since the callback should be able to accommodate either.
   
   > Could you explain more about a separate API call
   
   I just meant a function call of the scanner API (or I guess something like 
ADBC, but I don't know that API at all). I interpreted weston's comment to mean 
that a function call to get statistics is comparable to a function call to get 
a record batch.
   
   But when I say it provides extra flexibility, I mean that it provides a 
standard way for an independent producer to specify logic to a consumer for how 
to get statistics. This could allow for things like storing statistics 
independently from the data stream itself. I also feel like this doesn't 
preclude statistics metadata being packed into the schema (maybe in some 
application-specific way).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

Reply via email to