Datafusion interop not possible?

GitBox Thu, 04 Feb 2021 09:23:45 -0800


Anonyfox opened a new issue #9420:
URL: https://github.com/apache/arrow/issues/9420



   Hello, I have a just played with the new 3.0 crates of the arrow family, and 
keep hitting a wall for hours now. 
   
   1. I started with arrow itself, and built a large StructArray with my data. 
This was the easy part. 
   
   2. To cache this bigStructArray, I'd like to cache it as parquet file to 
disk. Seems like I can not use the parquet crate to achieve this, and the arrow 
crate has nothing useful, either. (CSV/JSON is not an option, this would en in 
a custom binary format again)
   
   3. Now the most disappointing part: It is absolutely unclear for me how I 
would query the StructArray (like: give me all records where column_X = Y, and 
so on) there a compute kernels in the docs but how to use it is completely 
unclear for me. Then I saw the datafusion crate which looks exactly like what I 
want, but... again no hint how to use my StructArray from memory. 
   
   I had the impression that the point of arrow, as a portable memory 
structure, is exactly this. And these crates do use arrow internally... Did I 
get something totally wrong? May someone give me a few hints what I'm missing 
(save/load/query)? It would be sad to go back to my large rust Vec<Item> blobs 
and manually iter()/filter() stuff. 
   
   Thanks in Advance!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] Anonyfox opened a new issue #9420: [Rust] Arrow/Parquet/Datafusion interop not possible?

Reply via email to