Hi Steffen and Etienne, [...] >> Apache Arrow https://arrow.apache.org/faq/ knows how to efficiently >> handle large tabular data. And, while not in our distribution, it blocks >> some workflows for Debian Med. Arrow comes with interfaces to all the >> prominent languages, for the Med-workflows it is typically the Python >> interface pyarrow that is needed.
I am also facing a project [1] that now made a dependency on Arrow (the C++ interface, for me) mandatory and that a missing Arrow in Debian prevented me from updating the packaging to the latest upstream version, leaving it stuck at some version from May. >> I am not using Arrow myself, but I presume just like me you all know >> some project that should be using it :) Yep :) > Thank you for the prospective! I see Sasha filed an RFP some > time ago [1], so there is definitely interest in Apache Arrow. > I don't know whether there is a packaging effort at the moment, > but if there is, I haven't found it by doing a research on > Salsa. I have prepared a first show of packaging for Arrow that works well for my case [2]. It replicates almost all binary packages built by upstream's own packaging pipeline (for version 4, at least, that's when I stopped looking at it) and I only had to tweak the build parameters a little bit. FYI they are doing their own Debian debs via JFrog and their own Ruby-based packaging tooling [3, 4]. The rest of the story is that I considered my package ready to upload, but a project partner familiar with using Arrow let me know that their development cycle is quite fast, with several breaking new major versions recently, support for new languages being added all the time (Rust, ...) and with adopting other code as well (Parquet, ...). This made me a bit uneasy as I only needed it as a dependency and I did not really bite off more than I could chew. I contacted upstream to ask for support [5] but it looked to me that they would rather not like to help out with Debian packaging directly. They would probably consider specific patches form us but in general stick to their own packaging tools. See the linked thread for more information. I must admit I did not really have the time so far to follow up with explaining how things are done in Debian and that they and us are probably using too different approaches for packaging. Long story short: I have finished packaging for Arrow 4 which looks good (someone might want to double-check the long d/copyright though) but I am not sure I want to track and maintain it _on my own_ in the long run. If someone from the Debian Med team wants to collaborate on this, be it packaging or upstream interaction, I would be willing to reconsider =) Cheers Sascha [1] https://tracker.debian.org/pkg/vast [2] https://salsa.debian.org/satta/arrow [3] https://apache.jfrog.io/ui/native/arrow/debian [4] https://github.com/apache/arrow/tree/master/dev/tasks/linux-packages [5] https://lists.apache.org/thread.html/rcd366cf9bde72d69e942ea31f3a0f1066727f6c7e8915bfdda6f009a%40%3Cdev.arrow.apache.org%3E

