[
https://issues.apache.org/jira/browse/ARROW-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326132#comment-17326132
]
David Li commented on ARROW-12487:
----------------------------------
This is indeed a regression from 3.0 to 4.0. See the attached file and this
script:
{code:java}
import pyarrow
import pyarrow.csv
import pyarrow.dataset
root = " [^test.csv] test.csv"
ds = pyarrow.dataset.dataset(root, format="csv")
fragments = ds.get_fragments()
fragment = next(fragments)
# Immediately errors in 3.0, hangs forever in 4.0
print(list(fragment.to_batches()))
{code}
> [C++][Dataset] ScanBatches() hangs if there's an error during scanning
> ----------------------------------------------------------------------
>
> Key: ARROW-12487
> URL: https://issues.apache.org/jira/browse/ARROW-12487
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 4.0.0
> Reporter: David Li
> Assignee: David Li
> Priority: Major
> Labels: dataset, datasets, pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Errors during scanning aren't properly reported, causing the iterator to hang
> forever.
> This affects ScanBatches() and anything built on top of it (Python
> to_batches, TakeRows, etc)
> Verified on the 4.0.0 RC
--
This message was sent by Atlassian Jira
(v8.3.4#803005)