This is an automated email from the ASF dual-hosted git repository.
gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git
The following commit(s) were added to refs/heads/master by this push:
new 18d1754 Add test file with unknown logical type (#72)
18d1754 is described below
commit 18d17540097fca7c40be3d42c167e6bfad90763c
Author: Dewey Dunnington <[email protected]>
AuthorDate: Wed Mar 26 07:36:41 2025 -0500
Add test file with unknown logical type (#72)
---
bad_data/README.md | 2 +-
data/README.md | 5 +++--
data/unknown-logical-type.parquet | Bin 0 -> 1051 bytes
3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/bad_data/README.md b/bad_data/README.md
index 0a030a0..bb12b6f 100644
--- a/bad_data/README.md
+++ b/bad_data/README.md
@@ -24,7 +24,7 @@ These are files used for reproducing various bugs that have
been reported.
corrupted.
* ARROW-RS-GH-6229-DICTHEADER.parquet: tests a case where the number of values
stored in dictionary page header is negative.
-* ARROW-RS-GH-6229-LEVELS.parquet: tests a case where a page has insufficient
+* ARROW-RS-GH-6229-LEVELS.parquet: tests a case where a page has insufficient
repetition levels.
* ARROW-GH-41321.parquet: test case of
https://github.com/apache/arrow/issues/41321
where decoded rep / def levels is less than num_values in page_header.
diff --git a/data/README.md b/data/README.md
index df8690e..cc7909b 100644
--- a/data/README.md
+++ b/data/README.md
@@ -57,6 +57,7 @@
| repeated_primitive_no_list.parquet | REPEATED INT32 and BYTE_ARRAY fields
without LIST annotation. See
[note](#REPEATED-primitive-fields-with-no-LIST-annotation) |
| map_no_value.parquet | MAP with null values, MAP with INT32 keys and no
values, and LIST<INT32> column with same values as the MAP keys. See
[map_no_value.md](map_no_value.md) |
| page_v2_empty_compressed.parquet | An INT32 column with DataPageV2, all
values are null, the zero-sized data is compressed using ZSTD |
+| unknown-logical-type.parquet | A file containing a column annotated with a
LogicalType whose identifier has been set to an abitrary high value to check
the behaviour of an old reader reading a file written by a new writer
containing an unsupported type (see [related
issue](https://github.com/apache/arrow/issues/41764)). |
TODO: Document what each file is in the table above.
@@ -403,8 +404,8 @@ where the Map key fields are marked as optional rather than
required.
This is not spec-compliant, yet appears in a number of existing data files in
the wild.
This issue has been fixed in:
-- [Trino
v386+](https://github.com/trinodb/trino/commit/3247bd2e64d7422bd13e805cd67cfca3fa8ba520)
-- [Presto
v0.274+](https://github.com/prestodb/presto/commit/842b46972c11534a7729d0a18e3abc5347922d1a)
+- [Trino
v386+](https://github.com/trinodb/trino/commit/3247bd2e64d7422bd13e805cd67cfca3fa8ba520)
+- [Presto
v0.274+](https://github.com/prestodb/presto/commit/842b46972c11534a7729d0a18e3abc5347922d1a)
We can recreate these problematic files for testing [arrow-rs
#5630](https://github.com/apache/arrow-rs/pull/5630)
with relevant Presto/Trino CLI, or with AWS Athena Console:
diff --git a/data/unknown-logical-type.parquet
b/data/unknown-logical-type.parquet
new file mode 100644
index 0000000..0548911
Binary files /dev/null and b/data/unknown-logical-type.parquet differ