corasaurus-hex opened a new issue, #16688: URL: https://github.com/apache/datafusion/issues/16688
### Is your feature request related to a problem or challenge? Datafusion currently supports [registering files in the Arrow IPC file format as tables](https://gist.github.com/corasaurus-hex/a6a4b47acd047e4def359931b5714a2b): ```rust ctx.register_arrow("my_table", "file.arrow", ArrowReadOptions::default()) .await .unwrap(); ctx.sql("SELECT * FROM my_table LIMIT 10") .await .unwrap() .show() .await .unwrap(); ``` You can also just reference the file path from SQL in `datafusion-cli`: ```sh > SELECT * FROM 'file.arrow' LIMIT 10; ``` You cannot, however, [do the same with files in the Arrow IPC stream format](https://gist.github.com/corasaurus-hex/d76a166112ecae57540f0f4a70c93b12). You get the error: ``` called `Result::unwrap()` on an `Err` value: ArrowError(ParseError("Arrow file does not contain correct footer"), None) ``` ### Describe the solution you'd like I would love if `register_arrow` supported files in the Arrow IPC stream format, or if another equivalent function would be added to do the same. Additionally, it would be great if `datafusion-cli` could just reference the files by name in the same way it can for the alternative Arrow IPC format. ### Describe alternatives you've considered 1. [Convert from the stream format to the file format](https://gist.github.com/corasaurus-hex/0dd4d7ef489ed49b5e45ea188c202aa9) and then query as shown above. 2. [Read all the record batches into memory and then register it as MemTable](https://gist.github.com/corasaurus-hex/23c9be7a415c6aede5a89c1d92a6cf47). 3. [Add a new `StreamProvider` impl and use a `StreamTable`](https://gist.github.com/corasaurus-hex/96574afd82780e48c4d0c679c116b23a). There are probably others, too, but none as simple as just being able to register the arrow file with `register_arrow` or referencing the file directly in `datafusion-cli`. ### Additional context I'm interested in taking a crack at this feature but, assuming y'all are interested in it, I would love some implementation guidance. Thanks for your time! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org