[ https://issues.apache.org/jira/browse/ARROW-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-2571: ---------------------------------- Labels: pull-request-available (was: ) > [C++] Lz4Codec doesn't properly handle empty data > ------------------------------------------------- > > Key: ARROW-2571 > URL: https://issues.apache.org/jira/browse/ARROW-2571 > Project: Apache Arrow > Issue Type: Bug > Reporter: Dmitry Kalinkin > Priority: Minor > Labels: pull-request-available > > For example a following closure test will fail: > {code:python} > import pyarrow as pa > import pyarrow.parquet as pq > data = [pa.array([None] * 10)] > batch = pa.RecordBatch.from_arrays(data, ['x']) > table = pa.Table.from_batches([batch]) > pq.write_table(table, "test.parquet", compression='LZ4') > table = pq.read_table("test.parquet") > {code} > with a following error > {code:java} > Traceback (most recent call last): File "test.py", line 8, in <module> table > = pq.read_table("test.parquet") File > "python3.6/site-packages/pyarrow/parquet.py", line 987, in read_table > use_pandas_metadata=use_pandas_metadata) File > "python3.6/site-packages/pyarrow/parquet.py", line 149, in read > nthreads=nthreads) File "_parquet.pyx", line 736, in > pyarrow._parquet.ParquetReader.read_all File "error.pxi", line 83, in > pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Arrow error: IOError: > Corrupt Lz4 compressed data. > {code} > Writing file from with LZ4 from python requires patch for ARROW-2570. But the > issue can be reproduced by creating an input file with parquet-cpp. The file > must be compressed with LZ4 and contain a column with only gap values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)