kazuakiyama opened a new issue, #410: URL: https://github.com/apache/arrow-julia/issues/410
I'm a radio astronomer interested in using this Julia-native implementation of the Apache Arrow in-memory format for black hole imaging with the [Event Horizon Telescope](https://eventhorizontelescope.org/). First of all, thanks for developing this package! We get interested in this package because the Apache Arrow and Parquet formats have been considered as a [major candidate for the next generation radio astronomy data format](https://github.com/ratt-ru/casa-arrow/discussions/1). I'm wondering if the package envisions implementing IO functions of the Apache Parquet format in the future. I read a previous [issue]( https://github.com/apache/arrow-julia/issues/227) regarding this topic. I believe that no method is yet available to directly load/write columnar data in Parquest file into the Arrow.jl's in-memory data ---- the only way to handle this in a pure Julia way seems to be converting disk-based data into the one in the Apache IPC format by using both Parquet.jl and Arrow.jl, and then reloading it into memory using Arrow.jl. This seems to be a bit problematic for our use case appearing as a major issue preventing us from using this package and apache's columnar formats in Julia. I think the key issues here - This sort of disk-based conversion via [Parquet.jl](https://github.com/JuliaIO/Parquet.jl) and Arrow.jl is not computationally optimal as it involves disk-write and -read. This will be a major overhead in our use case. - The Apache IPC format is [not prioritizing long-term storage and archival usage](https://arrow.apache.org/faq/#what-about-arrow-files-then), which would not satisfy the requirements of our community. So, purely relying on the IPC format won't be a solution. - The current Julia packages for the Apache Parquet format (e.g. [Parquet.jl](https://github.com/JuliaIO/Parquet.jl) and [Parquet2.jl](https://gitlab.com/ExpandingMan/Parquet2.jl) seem not fully support nested types, which are key to handle [our radio astronomy data in the Apache's columnar formats](https://github.com/ratt-ru/casa-arrow), while Arrow.jl does for the Arrow in-memory and IPC formats. Given a lot of similarities and cross sections between the specifications of the Apache Parquet and Arrow formats, I feel it is more straightforward to request the IO features of Parquet formats in Arrow.jl rather than request some missing features to the existing Julia Parquet packages. Any thoughts on this are appreciated. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
