corasaurus-hex opened a new issue, #16688:
URL: https://github.com/apache/datafusion/issues/16688

   ### Is your feature request related to a problem or challenge?
   
   Datafusion currently supports [registering files in the Arrow IPC file 
format as 
tables](https://gist.github.com/corasaurus-hex/a6a4b47acd047e4def359931b5714a2b):
   
   ```rust
       ctx.register_arrow("my_table", "file.arrow", ArrowReadOptions::default())
           .await
           .unwrap();
   
       ctx.sql("SELECT * FROM my_table LIMIT 10")
           .await
           .unwrap()
           .show()
           .await
           .unwrap();
   ```
   
   You can also just reference the file path from SQL in `datafusion-cli`:
   
   ```sh
   > SELECT * FROM 'file.arrow' LIMIT 10;
   ```
   
   You cannot, however, [do the same with files in the Arrow IPC stream 
format](https://gist.github.com/corasaurus-hex/d76a166112ecae57540f0f4a70c93b12).
 You get the error:
   
   ```
   called `Result::unwrap()` on an `Err` value: ArrowError(ParseError("Arrow 
file does not contain correct footer"), None)
   ```
   
   ### Describe the solution you'd like
   
   I would love if `register_arrow` supported files in the Arrow IPC stream 
format, or if another equivalent function would be added to do the same. 
Additionally, it would be great if `datafusion-cli` could just reference the 
files by name in the same way it can for the alternative Arrow IPC format.
   
   ### Describe alternatives you've considered
   
   1. [Convert from the stream format to the file 
format](https://gist.github.com/corasaurus-hex/0dd4d7ef489ed49b5e45ea188c202aa9)
 and then query as shown above.
   2. [Read all the record batches into memory and then register it as 
MemTable](https://gist.github.com/corasaurus-hex/23c9be7a415c6aede5a89c1d92a6cf47).
   3. [Add a new `StreamProvider` impl and use a 
`StreamTable`](https://gist.github.com/corasaurus-hex/96574afd82780e48c4d0c679c116b23a).
   
   There are probably others, too, but none as simple as just being able to 
register the arrow file with `register_arrow` or referencing the file directly 
in `datafusion-cli`.
   
   ### Additional context
   
   I'm interested in taking a crack at this feature but, assuming y'all are 
interested in it, I would love some implementation guidance.
   
   Thanks for your time!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to