[
https://issues.apache.org/jira/browse/ARROW-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe L. Korn resolved ARROW-2571.
--------------------------------
Resolution: Fixed
Fix Version/s: 0.10.0
Issue resolved by pull request 2032
[https://github.com/apache/arrow/pull/2032]
> [C++] Lz4Codec doesn't properly handle empty data
> -------------------------------------------------
>
> Key: ARROW-2571
> URL: https://issues.apache.org/jira/browse/ARROW-2571
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Dmitry Kalinkin
> Priority: Minor
> Labels: pull-request-available
> Fix For: 0.10.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> For example a following closure test will fail:
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> data = [pa.array([None] * 10)]
> batch = pa.RecordBatch.from_arrays(data, ['x'])
> table = pa.Table.from_batches([batch])
> pq.write_table(table, "test.parquet", compression='LZ4')
> table = pq.read_table("test.parquet")
> {code}
> with a following error
> {code:java}
> Traceback (most recent call last): File "test.py", line 8, in <module> table
> = pq.read_table("test.parquet") File
> "python3.6/site-packages/pyarrow/parquet.py", line 987, in read_table
> use_pandas_metadata=use_pandas_metadata) File
> "python3.6/site-packages/pyarrow/parquet.py", line 149, in read
> nthreads=nthreads) File "_parquet.pyx", line 736, in
> pyarrow._parquet.ParquetReader.read_all File "error.pxi", line 83, in
> pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Arrow error: IOError:
> Corrupt Lz4 compressed data.
> {code}
> Writing file from with LZ4 from python requires patch for ARROW-2570. But the
> issue can be reproduced by creating an input file with parquet-cpp. The file
> must be compressed with LZ4 and contain a column with only gap values.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)