paleolimbot opened a new pull request, #72: URL: https://github.com/apache/parquet-testing/pull/72
This implements a suggestion from @emkornfield that I didn't understand at the time but now do! https://github.com/apache/arrow/pull/41765#discussion_r1643718103 The PR https://github.com/apache/arrow/pull/41765 implements a path for reading unknown logical types (e.g., future ones) in Arrow C++; however, we need a test file to ensure it works. Briefly, it seems that at least Java and Arrow C++ error when reading unknown logical types, and at least DuckDB and nanoparquet just pretend that there is no logical type annotation. ```python # Version of pyarrow from # https://github.com/apache/arrow/pull/45459 # with generated/parquet_types.tcc hacked to write GEOMETRY with code 2555 (instead of 17) import pyarrow as pa from pyarrow import parquet import geoarrow.pyarrow as ga tab = pa.table( { "column with known type": [ "known string 1", "known string 2", "known string 3", ], "column with unknown type": ga.array([ "unknown bytes 1".encode(), "unknown bytes 2".encode(), "unknown bytes 3".encode(), ]), } ) parquet.write_table(tab, "unknown_logical_type.parquet") parquet.read_table("unknown_logical_type.parquet") #> Metadata contains Thrift LogicalType that is not recognized ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
