paleolimbot opened a new pull request, #72:
URL: https://github.com/apache/parquet-testing/pull/72

   This implements a suggestion from @emkornfield that I didn't understand at 
the time but now do! 
https://github.com/apache/arrow/pull/41765#discussion_r1643718103
   
   The PR https://github.com/apache/arrow/pull/41765 implements a path for 
reading unknown logical types (e.g., future ones) in Arrow C++; however, we 
need a test file to ensure it works.
   
   Briefly, it seems that at least Java and Arrow C++ error when reading 
unknown logical types, and at least DuckDB and nanoparquet just pretend that 
there is no logical type annotation.
   
   
   ```python
   # Version of pyarrow from
   # https://github.com/apache/arrow/pull/45459
   # with generated/parquet_types.tcc hacked to write GEOMETRY with code 2555 
(instead of 17)
   import pyarrow as pa
   from pyarrow import parquet
   import geoarrow.pyarrow as ga
   
   tab = pa.table(
       {
           "column with known type": [
               "known string 1",
               "known string 2",
               "known string 3",
           ],
           "column with unknown type": ga.array([
               "unknown bytes 1".encode(),
               "unknown bytes 2".encode(),
               "unknown bytes 3".encode(),
           ]),
       }
   )
   
   parquet.write_table(tab, "unknown_logical_type.parquet")
   
   parquet.read_table("unknown_logical_type.parquet")
   #> Metadata contains Thrift LogicalType that is not recognized
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to