Re: [I] Support for "Schema evolution" / Schema Adapters [arrow-rs]

via GitHub Thu, 31 Jul 2025 13:16:50 -0700


alamb commented on issue #6735:
URL: https://github.com/apache/arrow-rs/issues/6735#issuecomment-3141219708


   Since the parquet type system and arrow type system are different, it makes 
sense for the parquet reader in arrow-rs to read data out as one of the Arrow 
types that corresponds to the parquet physical types, depending on what the 
user specifies (what the crate does today)
   
   This makes sense to do in the parquet reader when there can be specialized 
code for the different target arryw types (e.g. `Utf8View`)
   
   I think any other type of data conversion should be done outside of the 
parquet crate (via the arrow cast kernel for example)
   
   Especially for (3) and (4) in my mind those are query engine concerns, and 
as @adriangb has been discovering it is often more efficient to rewrite the 
expression in terms of the target schema 
   
   For example, if the file has no column named `col`, it is liket faster to 
rewrite a predicate like `col = 5` into `NULL = 5` rather than add a constant 
NULL array and then evaluate `<NULL> = 5` on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Support for "Schema evolution" / Schema Adapters [arrow-rs]

Reply via email to