This is an automated email from the ASF dual-hosted git repository. apitrou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/parquet-testing.git
The following commit(s) were added to refs/heads/master by this push: new de7570a ARROW-18420: Add fixed_length_byte_array.parquet for page index test (#31) de7570a is described below commit de7570a865af017add78432e4c045912c213ae24 Author: Gang Wu <ust...@gmail.com> AuthorDate: Fri Dec 9 00:44:17 2022 +0800 ARROW-18420: Add fixed_length_byte_array.parquet for page index test (#31) --- data/README.md | 1 + data/fixed_length_byte_array.md | 73 +++++++++++++++++++++++++++++++++++ data/fixed_length_byte_array.parquet | Bin 0 -> 4335 bytes 3 files changed, 74 insertions(+) diff --git a/data/README.md b/data/README.md index 398a88c..4bb59c2 100644 --- a/data/README.md +++ b/data/README.md @@ -32,6 +32,7 @@ | alltypes_tiny_pages.parquet | small page sizes with dictionary encoding with page index from [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). | | alltypes_tiny_pages_plain.parquet | small page sizes with plain encoding with page index [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). | | rle_boolean_encoding.parquet | option boolean columns with RLE encoding | +| fixed_length_byte_array.parquet | optional FIXED_LENGTH_BYTE_ARRAY column with page index. See [fixed_length_byte_array.md](fixed_length_byte_array.md) for details. | | datapage_v1-uncompressed-checksum.parquet | uncompressed INT32 columns in v1 data pages with a matching CRC | | datapage_v1-snappy-compressed-checksum.parquet | compressed INT32 columns in v1 data pages with a matching CRC | | datapage_v1-corrupt-checksum.parquet | uncompressed INT32 columns in v1 data pages with a mismatching CRC | diff --git a/data/fixed_length_byte_array.md b/data/fixed_length_byte_array.md new file mode 100644 index 0000000..a0d98ac --- /dev/null +++ b/data/fixed_length_byte_array.md @@ -0,0 +1,73 @@ +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +`fixed_length_byte_array.parquet` is generated by parquet-mr version 1.13.0-SNAPSHOT. + +It has a single column of fixed length byte array type with size 4. + +In total there are 1000 values written in the descending order with some random nulls. + +# File Metadata (from parquet-cli meta command) +``` +File path: fixed_length_byte_array.parquet +Created by: parquet-mr version 1.13.0-SNAPSHOT (build d057b39d93014fe40f5067ee4a33621e65c91552) +Properties: + writer.model.name: example +Schema: +message schema { + required fixed_len_byte_array(4) flba_field; +} + + +Row group 0: count: 1000 3.84 B records start: 4 total(compressed): 3.749 kB total(uncompressed):3.749 kB +-------------------------------------------------------------------------------- + type encodings count avg size nulls min / max +flba_field FIXED[4] _ _ 1000 3.84 B 105 "0x00000001" / "0x000003E8" +``` + +# Column Index (from parquet-cli column-index command) +``` +row-group 0: +column index for column flba_field: +Boundary order: DESCENDING + null count min max +page-0 9 0x00000385 0x000003E8 +page-1 9 0x00000321 0x00000384 +page-2 19 0x000002BD 0x00000320 +page-3 10 0x00000259 0x000002BC +page-4 13 0x000001F5 0x00000258 +page-5 11 0x00000191 0x000001F4 +page-6 11 0x0000012D 0x00000190 +page-7 8 0x000000C9 0x0000012C +page-8 9 0x00000065 0x000000C8 +page-9 6 0x00000001 0x00000064 + +offset index for column flba_field: + offset compressed size first row index +page-0 4 390 0 +page-1 394 390 100 +page-2 784 350 200 +page-3 1134 386 300 +page-4 1520 373 400 +page-5 1893 382 500 +page-6 2275 382 600 +page-7 2657 394 700 +page-8 3051 390 800 +page-9 3441 402 900 +``` diff --git a/data/fixed_length_byte_array.parquet b/data/fixed_length_byte_array.parquet new file mode 100644 index 0000000..e86a886 Binary files /dev/null and b/data/fixed_length_byte_array.parquet differ