hi all, I wanted to bring up an idea that has been raised to me several times in the context of algorithms engineering work in Apache Arrow. The gist is that it can be challenging to do algorithms research inside large production systems, especially for those with a certain domain expertise (like SIMD vectorization) but perhaps are not well-versed in the software "structures" around the algorithms that are being studied.
Some folks in the HPC community in recent years have been creating smaller "proxy applications", also called "MiniApps" [1], to make the algorithms and performance research more accessible to developers who are less familiar with the software structures and class hierarchies that often surround the algorithms. It also allows the research work to evolve without having to pass through the gauntlet of the production CI systems and merge process. Eventually, research will get copied into the production application or form the basis of a clean implementation in production. Some Arrow contributors have already been doing some "miniapp" type development, though in external repositories (here's one relating to a discussion a couple of years ago about threading programming models and libraries where we were talking about Intel TBB: [2]). In short I wanted to propose creating a separate git repository under apache/arrow-* for this purpose, to invite these kinds of contributions to our project and to help more R&D work happen inside the Arrow umbrella so we have clean IP lineage. I can't imagine we would ever make releases from this repository but it could serve as a flexible place to put stuff (even in branches that are independent from each other) that may or may not be ready to make its home in one of our production repositories. Thoughts welcome! Thanks Wes [1]: https://www.osti.gov/servlets/purl/1461070 [2]: https://github.com/anton-malakhov/nyctaxi