[
https://issues.apache.org/jira/browse/ARROW-18037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617127#comment-17617127
]
Antoine Pitrou commented on ARROW-18037:
----------------------------------------
cc [~rtpsw] [~westonpace]
> [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess
> columns
> --------------------------------------------------------------------------------
>
> Key: ARROW-18037
> URL: https://issues.apache.org/jira/browse/ARROW-18037
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Antoine Pitrou
> Priority: Major
>
> As found while working on ARROW-18004: the dataset scanner and the Acero
> engine rely on {{ExecBatch::ToRecordBatch}} returning successfully when the
> given schema has fewer fields than the ExecBatch has columns.
> This apparently allows to implicitly drop the dataset-added columns
> ({{kAugmentedFields}} in {{arrow/dataset/scanner.cc}}) from a scan's final
> result.
> However, it seems wrong and brittle to do this implicitly at the
> {{ExecBatch::ToRecordBatch}} level (hiding potential errors). Instead, it
> should probably be done explicitly inside Acero/dataset.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)