[jira] [Commented] (ARROW-18037) [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns

Antoine Pitrou (Jira) Thu, 13 Oct 2022 08:21:04 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-18037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617127#comment-17617127
 ]


Antoine Pitrou commented on ARROW-18037:
----------------------------------------

cc [~rtpsw] [~westonpace]

> [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess 
> columns
> --------------------------------------------------------------------------------
>
>                 Key: ARROW-18037
>                 URL: https://issues.apache.org/jira/browse/ARROW-18037
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Antoine Pitrou
>            Priority: Major
>
> As found while working on ARROW-18004: the dataset scanner and the Acero 
> engine rely on {{ExecBatch::ToRecordBatch}} returning successfully when the 
> given schema has fewer fields than the ExecBatch has columns.
> This apparently allows to implicitly drop the dataset-added columns 
> ({{kAugmentedFields}} in {{arrow/dataset/scanner.cc}}) from a scan's final 
> result.
> However, it seems wrong and brittle to do this implicitly at the 
> {{ExecBatch::ToRecordBatch}} level (hiding potential errors). Instead, it 
> should probably be done explicitly inside Acero/dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18037) [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns

Reply via email to