nevi-me commented on pull request #8402:
URL: https://github.com/apache/arrow/pull/8402#issuecomment-716110760


   @carols10cents @alamb I think the whole reader logic needs replumbing ... 
There's at least a 1:1 mapping between Parquet types and Arrow types, and we 
can cast from Arrow types to other Arrow types based on the Arrow metadata. 
This is a less complex path, because one of the things I've been concerned 
about is that I/we are going to struggle a lot when we get to deeply-nested 
reads.
   
   I previously didn't understand your needs re. dictionary support between 
Parquet > Arrow > DataFusion. I now have context, so I can make decisions 
better.
   
   My plan was to remove `trait CastRecordReader` altogether, and instead use 
Arrow casts.
   I prefer Arrow casts because they handle transparent casts of `dyn Array & 
DataType::ANY` instead of the combinatoral `CastRecordReader`.
   
   I've now done this in https://github.com/integer32llc/arrow/pull/3, but I 
left a lot of `TODO`s which I'd love for us to address so we don't carry the 
tech debt of cast converters.
   
   The tests all pass now 🎊
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to