adragomir commented on issue #6735:
URL: https://github.com/apache/arrow-rs/issues/6735#issuecomment-3206524039

   @alamb @tustvold 
   I want to take a stab at this, and a couple of notes. Our usecase is one of 
very deeply nested schemas (~10 top level columns expanding into ~900 actual 
columns arrays. Think map of struct with list with structs etc), which we want 
to cast to subsets of the initial schema
   1. We have a variant of this in arrow cast, and another one in datafusion
   2. It seems to be that the code really "wants" to be in the arrow-cast 
functionality. We can do it in a module in something related to arrow record 
batch, like `RecordBatchAdapter` or something, but I haven't figured out how to 
NOT have to duplicate something like the recursive functions in the cast 
module, with matches based on different data types:   *  Arrow patch, for 
arrays, not record batches: 
https://github.com/hstack/arrow-rs/commit/208bc22ebdd9221a61970166ef4b018b6efd39ea#diff-cf9e72add2db905b7d49d7775b8a362951b2a774e6ed1d13c04d18e0fe09320eR1027-R1075
         *  The patch adds a couple of settings in the struct `allow_pruning`, 
and `allow_empty` - which basically implements #6726, and other possible issues 
related to schema
      * Datafusion patch, which could be extracted to arrow-rs: 
https://github.com/hstack/datafusion/commit/5704c918114b8466907076d94df0548d62dad3fd
        * Probably the most relevant part is the `deep.rs`, where we need to 
duplicate the cast functionality to be able to handle the recurrence etc. 
   
   If I understand correctly, would you suggest having some new functionality 
near arrow recordbatch, something like `RecordBatchCast` or RecordBatchAdapter, 
and having that functionality separate, even if it would mean duplicating in 
spirit some code from the cast module ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to