This is an automated email from the ASF dual-hosted git repository.
apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git
The following commit(s) were added to refs/heads/master by this push:
new 8a3d3fd ARROW-18420: Add fint32_with_null_pages.parquet for page
index test (#32)
8a3d3fd is described below
commit 8a3d3fd5ff7691ee07ca9802df66290a3106e4b7
Author: Gang Wu <[email protected]>
AuthorDate: Tue Dec 13 20:18:06 2022 +0800
ARROW-18420: Add fint32_with_null_pages.parquet for page index test (#32)
---
data/README.md | 1 +
data/int32_with_null_pages.md | 73 +++++++++++++++++++++++++++++++++++++
data/int32_with_null_pages.parquet | Bin 0 -> 3829 bytes
3 files changed, 74 insertions(+)
diff --git a/data/README.md b/data/README.md
index 4bb59c2..b5d05a2 100644
--- a/data/README.md
+++ b/data/README.md
@@ -33,6 +33,7 @@
| alltypes_tiny_pages_plain.parquet | small page sizes with plain
encoding with page index
[impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet).
|
| rle_boolean_encoding.parquet | option boolean columns with RLE
encoding
|
| fixed_length_byte_array.parquet | optional
FIXED_LENGTH_BYTE_ARRAY column with page index. See
[fixed_length_byte_array.md](fixed_length_byte_array.md) for details.
|
+| int32_with_null_pages.parquet | optional INT32 column with
random null pages. See [int32_with_null_pages.md](int32_with_null_pages.md) for
details. |
| datapage_v1-uncompressed-checksum.parquet | uncompressed INT32 columns
in v1 data pages with a matching CRC |
| datapage_v1-snappy-compressed-checksum.parquet | compressed INT32 columns in
v1 data pages with a matching CRC |
| datapage_v1-corrupt-checksum.parquet | uncompressed INT32 columns
in v1 data pages with a mismatching CRC |
diff --git a/data/int32_with_null_pages.md b/data/int32_with_null_pages.md
new file mode 100644
index 0000000..fe16340
--- /dev/null
+++ b/data/int32_with_null_pages.md
@@ -0,0 +1,73 @@
+<!--
+ ~ Licensed to the Apache Software Foundation (ASF) under one
+ ~ or more contributor license agreements. See the NOTICE file
+ ~ distributed with this work for additional information
+ ~ regarding copyright ownership. The ASF licenses this file
+ ~ to you under the Apache License, Version 2.0 (the
+ ~ "License"); you may not use this file except in compliance
+ ~ with the License. You may obtain a copy of the License at
+ ~
+ ~ http://www.apache.org/licenses/LICENSE-2.0
+ ~
+ ~ Unless required by applicable law or agreed to in writing,
+ ~ software distributed under the License is distributed on an
+ ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ~ KIND, either express or implied. See the License for the
+ ~ specific language governing permissions and limitations
+ ~ under the License.
+ -->
+
+`int32_with_null_pages.parquet` is generated by parquet-mr version
1.13.0-SNAPSHOT.
+
+It has a single column of int32 type with 1000 values and page index enabled.
+
+Both integer and null values are random generated. However, a null page is
generated by purpose.
+
+# File Metadata (from parquet-cli meta command)
+```
+File path: int32_with_null_pages.parquet
+Created by: parquet-mr version 1.13.0-SNAPSHOT (build
433de8df33fcf31927f7b51456be9f53e64d48b9)
+Properties:
+ writer.model.name: example
+Schema:
+message schema {
+ optional int32 int32_field;
+}
+
+
+Row group 0: count: 1000 3.33 B records start: 4 total(compressed): 3.250
kB total(uncompressed):3.250 kB
+--------------------------------------------------------------------------------
+ type encodings count avg size nulls min / max
+int32_field INT32 _ _ 1000 3.33 B 275 "-2136906554" /
"2145722375"
+```
+
+# Column Index (from parquet-cli column-index command)
+```
+row-group 0:
+column index for column int32_field:
+Boundary order: UNORDERED
+ null count min max
+page-0 8 -2135807632
2144701119
+page-1 55 -2104090659
1745329571
+page-2 100 <none>
<none>
+page-3 52 -2116849709
2077105757
+page-4 16 -2048691758
2143189382
+page-5 12 -2017923401
2087827129
+page-6 5 -2136906554
2125689411
+page-7 7 -2113313110
2145722375
+page-8 8 -2046900272
2087168549
+page-9 12 -1941944785
2078586537
+
+offset index for column int32_field:
+ offset compressed size first row index
+page-0 4 415 0
+page-1 419 220 100
+page-2 639 31 200
+page-3 670 228 300
+page-4 898 382 400
+page-5 1280 402 500
+page-6 1682 422 600
+page-7 2104 411 700
+page-8 2515 417 800
+page-9 2932 400 900
+```
diff --git a/data/int32_with_null_pages.parquet
b/data/int32_with_null_pages.parquet
new file mode 100644
index 0000000..8263774
Binary files /dev/null and b/data/int32_with_null_pages.parquet differ