All

We're very interested in exploring how Arrow can be used in traditional
scientific computing installations.

I've read several initial overviews, and am particularly interested in the
mentions of RDMA where available, since that is a standard capability on
HPC platforms.

Are there specific design thoughts already on how the current design will
use RDMA capabilities if available?

Another question is around a typical HPC + Spark workflow

1. One part is a standard HPC app that reads/writes HDF5 data
(multidimensional array container).
2. Another is a Spark application that needs to run either concurrently or
after the HPC application
3. They both need to operate on the same data ideally without explicit ETL
(ideally avoiding any file IO along the way)

In this scenario, would this be feasible (modulo the dev work that needs to
be done for HDF5/Spark)

1. An (alternative) arrow serialization of the respective components in the
HDF5 container, so the HPC app is really writing arrow-serialized data
under standard HDF5 API

2. A Spark HDF5 wrapper (maybe a Spark context?) that is HDF5 metadata
aware that the Spark application uses as its 'lens' on the HDF5/arrow data

Thanks
Venkat

Reply via email to