westonpace commented on issue #38837: URL: https://github.com/apache/arrow/issues/38837#issuecomment-2129540858
> https://github.com/apache/arrow/issues/38837#issuecomment-2123728784 uses ArrowSchema not ArrowArray/ArrowArrayStream. So it doesn't contain both data and statistics. I think the following flow is used. (ArrowSchema is used before we use ArrowArrayStream.)https://github.com/apache/arrow/issues/38837#issuecomment-2123728784 uses ArrowSchema not ArrowArray/ArrowArrayStream. So it doesn't contain both data and statistics. I think the following flow is used. (ArrowSchema is used before we use ArrowArrayStream.) I see now, I had misunderstood and thought we were talking about `ArrowArray`. I understand why you would want statistics in the `ArrowSchema` instead of `ArrowArray`. We were already talking about two function calls (`GetTableSchema()` and `GetTableData()`). If we don't use `ArrowSchema` then we need three function calls (`GetTableSchema()`, `GetTableStatistics()`, and `GetTableData()`). I think both approaches make sense but this is the tricky part: > How to encode each value (2.9, true and 29.9) to raw byte data? We can use only raw byte data for a value of ArrowSchema::metadata. There are ways to solve this but they all seem like a lot of work for alignment and maintenance. I don't think the benefit (combining `GetTableSchema` and `GetTableStatistics`) is worth the development cost. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
