This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git


The following commit(s) were added to refs/heads/master by this push:
     new e31fe1a  ARROW-9177: Add Hadoop-produced LZ4-compressed file with 
several frames
e31fe1a is described below

commit e31fe1a02c9e9f271e4bfb8002d403c52f1ef8eb
Author: Antoine Pitrou <[email protected]>
AuthorDate: Mon Jan 18 14:07:14 2021 +0100

    ARROW-9177: Add Hadoop-produced LZ4-compressed file with several frames
    
    It seems than when the decompressed size exceeds 128 kiB, Hadoop compresses 
the data in several concatenated "frames".
    
    Data in this file:
    ```
    Version: 1.0
    Created By: parquet-mr version 1.11.1 (build 
765bd5cd7fdef2af1cecd0755000694b992bfadd)
    Total rows: 10000
    Number of RowGroups: 1
    Number of Real Columns: 1
    Number of Columns: 1
    Number of Selected Columns: 1
    Column 0: a (BYTE_ARRAY/UTF8)
    --- Row Group: 0 ---
    --- Total Bytes: 400029 ---
    --- Rows: 10000 ---
    Column 0
      Values: 10000, Null Values: 0, Distinct Values: 0
      Max: ffffe6a0-e0c0-4e65-a9d4-f7f4c176aea2, Min: 
00087de7-10df-4979-94cf-79279f9745ce
      Compression: LZ4_HADOOP, Encodings: BIT_PACKED PLAIN
      Uncompressed Size: 400029, Compressed Size: 358351
    --- Values ---
    a                             |
    [ ... ]
    ```
---
 data/hadoop_lz4_compressed_larger.parquet | Bin 0 -> 358859 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/data/hadoop_lz4_compressed_larger.parquet 
b/data/hadoop_lz4_compressed_larger.parquet
new file mode 100644
index 0000000..0f133f8
Binary files /dev/null and b/data/hadoop_lz4_compressed_larger.parquet differ

Reply via email to