alamb commented on issue #6735: URL: https://github.com/apache/arrow-rs/issues/6735#issuecomment-3141219708
Since the parquet type system and arrow type system are different, it makes sense for the parquet reader in arrow-rs to read data out as one of the Arrow types that corresponds to the parquet physical types, depending on what the user specifies (what the crate does today) This makes sense to do in the parquet reader when there can be specialized code for the different target arryw types (e.g. `Utf8View`) I think any other type of data conversion should be done outside of the parquet crate (via the arrow cast kernel for example) Especially for (3) and (4) in my mind those are query engine concerns, and as @adriangb has been discovering it is often more efficient to rewrite the expression in terms of the target schema For example, if the file has no column named `col`, it is liket faster to rewrite a predicate like `col = 5` into `NULL = 5` rather than add a constant NULL array and then evaluate `<NULL> = 5` on it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org