All We're very interested in exploring how Arrow can be used in traditional scientific computing installations.
I've read several initial overviews, and am particularly interested in the mentions of RDMA where available, since that is a standard capability on HPC platforms. Are there specific design thoughts already on how the current design will use RDMA capabilities if available? Another question is around a typical HPC + Spark workflow 1. One part is a standard HPC app that reads/writes HDF5 data (multidimensional array container). 2. Another is a Spark application that needs to run either concurrently or after the HPC application 3. They both need to operate on the same data ideally without explicit ETL (ideally avoiding any file IO along the way) In this scenario, would this be feasible (modulo the dev work that needs to be done for HDF5/Spark) 1. An (alternative) arrow serialization of the respective components in the HDF5 container, so the HPC app is really writing arrow-serialized data under standard HDF5 API 2. A Spark HDF5 wrapper (maybe a Spark context?) that is HDF5 metadata aware that the Spark application uses as its 'lens' on the HDF5/arrow data Thanks Venkat