corasaurus-hex commented on issue #16688: URL: https://github.com/apache/datafusion/issues/16688#issuecomment-3369897963
I took a crack at this today and got pretty far but I'm running up against a couple instances of low-level arrow decoding that feel out of place. I'd love your thoughts on maybe upstreaming something or coming up with a better abstraction to handle the need to do selective low-level decoding. I'm willing to put in the work, I just don't want to put a lot of labor into something that might get rejected for going a bad direction. 1. [`ArrowOpener`](https://github.com/apache/datafusion/blob/82cd7f3cdb8dbe0b63b8b62f54543641598655a0/datafusion/core/src/datasource/physical_plan/arrow_file.rs#L159-L255): I'd need to duplicate this decoding/fetching for IPC streams as well. It feels like there's a missing abstraction somewhere for this... 2. [`infer_schema_from_file_stream`](https://github.com/apache/datafusion/blob/82cd7f3cdb8dbe0b63b8b62f54543641598655a0/datafusion/core/src/datasource/file_format/arrow.rs#L350-L433): this was a pretty straight-forward one to solve ([here's how I did it](https://github.com/corasaurus-hex/datafusion/blob/b53b67ca165514801d9aabb2df842da3b93c1a47/datafusion/core/src/datasource/file_format/arrow.rs#L350-L465)), but still feels like something that might be best to have in arrow-rs itself? You can see my WIP branch diff [here](https://github.com/apache/datafusion/compare/main...corasaurus-hex:datafusion:register-arrow-ipc-stream-format-files). I'm not sure if I should be trying to combine things like I am or if I should be creating entirely separate handling for `arrow_stream` vs `arrow_file`. They're so similar and yet so very different in practice... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
