cjc0013 opened a new pull request, #50071:
URL: https://github.com/apache/arrow/pull/50071

   ### Rationale for this change
   
   The statistics schema documentation describes statistics arrays for Arrow 
arrays and nested field column indexes, but C++ only exposed 
`RecordBatch::MakeStatisticsArray()` and only enumerated top-level record batch 
columns.
   
   This leaves two related gaps:
   
   * callers cannot ask an `Array` to produce its statistics schema 
representation directly;
   * `RecordBatch::MakeStatisticsArray()` drops statistics attached to nested 
child arrays.
   
   ### What changes are included in this PR?
   
   * Add `Array::MakeStatisticsArray()`.
   * Share the statistics-array construction path between `Array` and 
`RecordBatch`.
   * Traverse nested `ArrayData::child_data` when enumerating record batch 
column statistics, using the same depth-first column index order described by 
the IPC record batch message rules.
   * Preserve existing record batch row-count behavior.
   
   ### Are these changes tested?
   
   Yes. Locally:
   
   * `ninja arrow-table-test -j2`
   * `./debug/arrow-table-test --gtest_filter="*MakeStatisticsArray*"`: 24 
tests passed
   * `./debug/arrow-table-test`: 181 tests passed
   
   ### Are there any user-facing changes?
   
   Yes. This adds the public C++ `Array::MakeStatisticsArray()` API and lets 
`RecordBatch::MakeStatisticsArray()` include nested child-array statistics when 
present. This is not a breaking API change.
   
   Closes #45804.
   Addresses part of #45474.
   Refs #45806.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to