Antoine Pitrou created ARROW-18037:
--------------------------------------
Summary: [C++] Acero/dataset relies on ExecBatch::ToRecordBatch
truncating excess columns
Key: ARROW-18037
URL: https://issues.apache.org/jira/browse/ARROW-18037
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Antoine Pitrou
As found while working on ARROW-18004: the dataset scanner and the Acero engine
rely on {{ExecBatch::ToRecordBatch}} returning successfully when the given
schema has fewer fields than the ExecBatch has columns.
This apparently allows to implicitly drop the dataset-added columns
({{kAugmentedFields}} in {{arrow/dataset/scanner.cc}}) from a scan's final
result.
However, it seems wrong and brittle to do this implicitly at the
{{ExecBatch::ToRecordBatch}} level (hiding potential errors). Instead, it
should probably be done explicitly inside Acero/dataset.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)