This is an automated email from the ASF dual-hosted git repository.
apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git
The following commit(s) were added to refs/heads/master by this push:
new e31fe1a ARROW-9177: Add Hadoop-produced LZ4-compressed file with
several frames
e31fe1a is described below
commit e31fe1a02c9e9f271e4bfb8002d403c52f1ef8eb
Author: Antoine Pitrou <[email protected]>
AuthorDate: Mon Jan 18 14:07:14 2021 +0100
ARROW-9177: Add Hadoop-produced LZ4-compressed file with several frames
It seems than when the decompressed size exceeds 128 kiB, Hadoop compresses
the data in several concatenated "frames".
Data in this file:
```
Version: 1.0
Created By: parquet-mr version 1.11.1 (build
765bd5cd7fdef2af1cecd0755000694b992bfadd)
Total rows: 10000
Number of RowGroups: 1
Number of Real Columns: 1
Number of Columns: 1
Number of Selected Columns: 1
Column 0: a (BYTE_ARRAY/UTF8)
--- Row Group: 0 ---
--- Total Bytes: 400029 ---
--- Rows: 10000 ---
Column 0
Values: 10000, Null Values: 0, Distinct Values: 0
Max: ffffe6a0-e0c0-4e65-a9d4-f7f4c176aea2, Min:
00087de7-10df-4979-94cf-79279f9745ce
Compression: LZ4_HADOOP, Encodings: BIT_PACKED PLAIN
Uncompressed Size: 400029, Compressed Size: 358351
--- Values ---
a |
[ ... ]
```
---
data/hadoop_lz4_compressed_larger.parquet | Bin 0 -> 358859 bytes
1 file changed, 0 insertions(+), 0 deletions(-)
diff --git a/data/hadoop_lz4_compressed_larger.parquet
b/data/hadoop_lz4_compressed_larger.parquet
new file mode 100644
index 0000000..0f133f8
Binary files /dev/null and b/data/hadoop_lz4_compressed_larger.parquet differ