Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

via GitHub Fri, 24 May 2024 06:28:42 -0700


westonpace commented on issue #38837:
URL: https://github.com/apache/arrow/issues/38837#issuecomment-2129540858


   > https://github.com/apache/arrow/issues/38837#issuecomment-2123728784 uses 
ArrowSchema not ArrowArray/ArrowArrayStream. So it doesn't contain both data 
and statistics. I think the following flow is used. (ArrowSchema is used before 
we use 
ArrowArrayStream.)https://github.com/apache/arrow/issues/38837#issuecomment-2123728784
 uses ArrowSchema not ArrowArray/ArrowArrayStream. So it doesn't contain both 
data and statistics. I think the following flow is used. (ArrowSchema is used 
before we use ArrowArrayStream.)
   
   I see now, I had misunderstood and thought we were talking about 
`ArrowArray`.  I understand why you would want statistics in the `ArrowSchema` 
instead of `ArrowArray`.  We were already talking about two function calls 
(`GetTableSchema()` and `GetTableData()`).  If we don't use `ArrowSchema` then 
we need three function calls (`GetTableSchema()`, `GetTableStatistics()`, and 
`GetTableData()`).
   
   I think both approaches make sense but this is the tricky part:
   
   > How to encode each value (2.9, true and 29.9) to raw byte data? We can use 
only raw byte data for a value of ArrowSchema::metadata.
   
   There are ways to solve this but they all seem like a lot of work for 
alignment and maintenance.  I don't think the benefit (combining 
`GetTableSchema` and `GetTableStatistics`) is worth the development cost.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

Reply via email to