Hi, I would like to propose standardizing how to pas statistics through the C data interface.
Motivation: * We want to pass not only Apache Arrow data but also statistics of them through the C data interface for query planning. Approach: * Define a standardized schema for statistics. * Represent statistics as an Apache Arrow array that uses the schema. * Pass the statistics Apache Arrow array through the C data interface like a normal Apache Arrow array. Note that we don't define a new interface for statistics. We just use the existing C data interface. A statistics Apache Arrow array is passed through a separated API call. See also: * The discussion of this: https://lists.apache.org/thread/z0jz2bnv61j7c6lbk7lympdrs49f69cx * The PR of this proposal that includes the statistics schema definition: https://github.com/apache/arrow/pull/43553 * The preview URL of the PR: http://crossbow.voltrondata.com/pr_docs/43553/format/CDataInterfaceStatistics.html Note: * I implemented this proposal only in C++. The implementation is already merged into apache/arrow. Should we have one more implementation like format specification change? http://crossbow.voltrondata.com/pr_docs/43553/format/Changing.html#at-least-two-reference-implementations The vote will be open for at least 72 hours. [ ] +1 Accept this proposal [ ] +0 [ ] -1 Do not accept this proposal because... Thanks, -- kou