adragomir commented on issue #6735: URL: https://github.com/apache/arrow-rs/issues/6735#issuecomment-3206524039
@alamb @tustvold I want to take a stab at this, and a couple of notes. Our usecase is one of very deeply nested schemas (~10 top level columns expanding into ~900 actual columns arrays. Think map of struct with list with structs etc), which we want to cast to subsets of the initial schema 1. We have a variant of this in arrow cast, and another one in datafusion 2. It seems to be that the code really "wants" to be in the arrow-cast functionality. We can do it in a module in something related to arrow record batch, like `RecordBatchAdapter` or something, but I haven't figured out how to NOT have to duplicate something like the recursive functions in the cast module, with matches based on different data types: * Arrow patch, for arrays, not record batches: https://github.com/hstack/arrow-rs/commit/208bc22ebdd9221a61970166ef4b018b6efd39ea#diff-cf9e72add2db905b7d49d7775b8a362951b2a774e6ed1d13c04d18e0fe09320eR1027-R1075 * The patch adds a couple of settings in the struct `allow_pruning`, and `allow_empty` - which basically implements #6726, and other possible issues related to schema * Datafusion patch, which could be extracted to arrow-rs: https://github.com/hstack/datafusion/commit/5704c918114b8466907076d94df0548d62dad3fd * Probably the most relevant part is the `deep.rs`, where we need to duplicate the cast functionality to be able to handle the recurrence etc. If I understand correctly, would you suggest having some new functionality near arrow recordbatch, something like `RecordBatchCast` or RecordBatchAdapter, and having that functionality separate, even if it would mean duplicating in spirit some code from the cast module ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org