adriangb commented on issue #42069: URL: https://github.com/apache/arrow/issues/42069#issuecomment-2841834778
> I think handling `typed_value` makes handling the values quite complicated. For example two different Parquet files might use different schema for shredding so two variant vectors would have different schema as well. Yes that's precisely my point. I think each query engine will have to play a game of: - My predicate is `variant_get(col, 'key')` (assume this is some SQL written by a user) - For each file, does `col.typed_value.key` exist? - If so my predicate to `col.typed_value.key` - Do query engine stuff like stats pruning, etc. In other words, I think `variant_get` could handle variant shredding as a nice to have / fallback but I'd guess query engines will have to special case variant shredding anyway to get stats pruning, late materialization, etc. Otherwise they'd be forced to always read the entire column and hand that to `variant_get` which pretty much defeats the point of shredding. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org