[
https://issues.apache.org/jira/browse/ARROW-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Francois Saint-Jacques resolved ARROW-5974.
-------------------------------------------
Resolution: Fixed
Fix Version/s: 1.0.0
Issue resolved by pull request 4923
[https://github.com/apache/arrow/pull/4923]
> [Python][C++] Enable CSV reader to read from concatenated gzip stream
> ---------------------------------------------------------------------
>
> Key: ARROW-5974
> URL: https://issues.apache.org/jira/browse/ARROW-5974
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Affects Versions: 0.13.0, 0.14.0
> Reporter: Jordan Samuels
> Assignee: Antoine Pitrou
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> If two gzipped files are concatenated together, the result is a valid gzip
> file. However, it appears that pyarrow.csv.read_csv will only read the
> portion related to the first file.
> If the repro script
> [here|https://gist.github.com/jordansamuels/d69f1c22c58418f5dfa0785b9ecd211e]
> is run, the output is:
> {{$ python repro.py}}
> {{pyarrow.csv only reads one row:}}
> {{ x}}
> {{0 1}}
> {{pandas reads two rows:}}
> {{ x}}
> {{0 1}}
> {{1 2}}
> {{pyarrow version: 0.14.0}}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)