View types directly [arrow]

via GitHub Wed, 21 May 2025 07:46:50 -0700


pitrou opened a new pull request, #46532:
URL: https://github.com/apache/arrow/pull/46532


   ### Rationale for this change
   
   Parquet has almost no support for LargeBinary and BinaryView data:
   * on writing, those types are not supported at all
   * on reading, data is decoded as regular Binary data with automatic 
chunking; if the stored Arrow schema points to a LargeBinary field, the data is 
later cast to that type
   
   ### What changes are included in this PR?
   
   * Refactor the BYTE_ARRAY column decoders to allow decoding directly into a 
LargeBinaryBuilder or a BinaryViewBuilder
   * Add a `binary_type` setting to `ArrowReaderProperties` to change the type 
that BYTE_ARRAY columns are decoded to by default
   * Support reading Parquet GEOMETRY types with a LargeBinary or BinaryView 
storage
   * Add benchmarks for reading and writing BinaryView data from/to Parquet
   * Add the corresponding Python bindings
   
   ### Are these changes tested?
   
   Yes.
   
   ### Are there any user-facing changes?
   
   New APIs and improved functionality.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] GH-43041: [C++][Python] Read/write Parquet BYTE_ARRAY as Large/View types directly [arrow]

Reply via email to