adriangb commented on issue #42069:
URL: https://github.com/apache/arrow/issues/42069#issuecomment-2841834778

   >  I think handling `typed_value` makes handling the values quite 
complicated. For example two different Parquet files might use different schema 
for shredding so two variant vectors would have different schema as well.
   
   Yes that's precisely my point.
   
   I think each query engine will have to play a game of:
   - My predicate is `variant_get(col, 'key')` (assume this is some SQL written 
by a user)
   - For each file, does `col.typed_value.key` exist?
   - If so my predicate to `col.typed_value.key`
   - Do query engine stuff like stats pruning, etc.
   
   In other words, I think `variant_get` could handle variant shredding as a 
nice to have / fallback but I'd guess query engines will have to special case 
variant shredding anyway to get stats pruning, late materialization, etc. 
Otherwise they'd be forced to always read the entire column and hand that to 
`variant_get` which pretty much defeats the point of shredding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to