Hi all, [NB: sent to d-science, d-python, d-devel and the RFP bug; reply-to set to d-science and the RFP bug only]
An update on Apache Arrow, and in particular the Python library PyArrow. For those who don't know: Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. The project is developing a multi-language collection of libraries for solving systems problems related to in-memory analytical data processing. This includes such topics as: * Zero-copy shared memory and RPC-based data movement * Reading and writing file formats (like CSV, Apache ORC, and Apache Parquet) * In-memory analytics and query processing (from: https://arrow.apache.org/docs/index.html) Pandas has announced that Pandas 3.x will depend on PyArrow in a critical way (it will back the "string" datatype), and it is due to be released imminently. So this is a plea for anyone looking for something really helpful to do: it would be great to have a group of developers finally package this! There was some initial work done (see the RFP bug report for details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970021), but that is fairly old now. As Apache Arrow supports numerous languages, it may well benefit from having a group of developers with different areas of expertise to build it. (Or perhaps it would make more sense to split the upstream source into a collection of different Debian source packages for the different supported languages. I don't know.) Unfortunately I don't have the capacity to devote any time to it myself. Thanks in advance for anyone who can step forward for this! Best wishes, Julian