[
https://issues.apache.org/jira/browse/ARROW-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326133#comment-17326133
]
David Li commented on ARROW-12487:
----------------------------------
(JIRA doesn't allow me to upload the offending CSV easily, I'll attach it
another way - but any file that raises an error during scan time would work,
e.g. if the CSV reader mis-infers a column type.)
> [C++][Dataset] ScanBatches() hangs if there's an error during scanning
> ----------------------------------------------------------------------
>
> Key: ARROW-12487
> URL: https://issues.apache.org/jira/browse/ARROW-12487
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 4.0.0
> Reporter: David Li
> Assignee: David Li
> Priority: Blocker
> Labels: dataset, datasets, pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Errors during scanning aren't properly reported, causing the iterator to hang
> forever.
> This affects ScanBatches() and anything built on top of it (Python
> to_batches, TakeRows, etc)
> Verified on the 4.0.0 RC
--
This message was sent by Atlassian Jira
(v8.3.4#803005)