(parquet-testing) branch master updated: Add test file with unknown logical type (#72)

gangwu Wed, 26 Mar 2025 05:45:11 -0700

This is an automated email from the ASF dual-hosted git repository.

gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git



The following commit(s) were added to refs/heads/master by this push:
     new 18d1754  Add test file with unknown logical type (#72)
18d1754 is described below

commit 18d17540097fca7c40be3d42c167e6bfad90763c
Author: Dewey Dunnington <[email protected]>
AuthorDate: Wed Mar 26 07:36:41 2025 -0500

    Add test file with unknown logical type (#72)
---
 bad_data/README.md                |   2 +-
 data/README.md                    |   5 +++--
 data/unknown-logical-type.parquet | Bin 0 -> 1051 bytes
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/bad_data/README.md b/bad_data/README.md
index 0a030a0..bb12b6f 100644
--- a/bad_data/README.md
+++ b/bad_data/README.md
@@ -24,7 +24,7 @@ These are files used for reproducing various bugs that have 
been reported.
   corrupted.
 * ARROW-RS-GH-6229-DICTHEADER.parquet: tests a case where the number of values
   stored in dictionary page header is negative.
-* ARROW-RS-GH-6229-LEVELS.parquet: tests a case where a page has insufficient 
+* ARROW-RS-GH-6229-LEVELS.parquet: tests a case where a page has insufficient
   repetition levels.
 * ARROW-GH-41321.parquet: test case of 
https://github.com/apache/arrow/issues/41321
   where decoded rep / def levels is less than num_values in page_header.
diff --git a/data/README.md b/data/README.md
index df8690e..cc7909b 100644
--- a/data/README.md
+++ b/data/README.md
@@ -57,6 +57,7 @@
 | repeated_primitive_no_list.parquet | REPEATED INT32 and BYTE_ARRAY fields 
without LIST annotation. See 
[note](#REPEATED-primitive-fields-with-no-LIST-annotation) |
 | map_no_value.parquet | MAP with null values, MAP with INT32 keys and no 
values, and LIST<INT32> column with same values as the MAP keys. See 
[map_no_value.md](map_no_value.md) |
 | page_v2_empty_compressed.parquet | An INT32 column with DataPageV2, all 
values are null, the zero-sized data is compressed using ZSTD |
+| unknown-logical-type.parquet | A file containing a column annotated with a 
LogicalType whose identifier has been set to an abitrary high value to check 
the behaviour of an old reader reading a file written by a new writer 
containing an unsupported type (see [related 
issue](https://github.com/apache/arrow/issues/41764)). |
 
 TODO: Document what each file is in the table above.
 
@@ -403,8 +404,8 @@ where the Map key fields are marked as optional rather than 
required.
 This is not spec-compliant, yet appears in a number of existing data files in 
the wild.
 
 This issue has been fixed in:
-- [Trino 
v386+](https://github.com/trinodb/trino/commit/3247bd2e64d7422bd13e805cd67cfca3fa8ba520)
 
-- [Presto 
v0.274+](https://github.com/prestodb/presto/commit/842b46972c11534a7729d0a18e3abc5347922d1a)
  
+- [Trino 
v386+](https://github.com/trinodb/trino/commit/3247bd2e64d7422bd13e805cd67cfca3fa8ba520)
+- [Presto 
v0.274+](https://github.com/prestodb/presto/commit/842b46972c11534a7729d0a18e3abc5347922d1a)
 
 We can recreate these problematic files for testing [arrow-rs 
#5630](https://github.com/apache/arrow-rs/pull/5630)
 with relevant Presto/Trino CLI, or with AWS Athena Console:
diff --git a/data/unknown-logical-type.parquet 
b/data/unknown-logical-type.parquet
new file mode 100644
index 0000000..0548911
Binary files /dev/null and b/data/unknown-logical-type.parquet differ

(parquet-testing) branch master updated: Add test file with unknown logical type (#72)

Reply via email to