hi all,

I wanted to bring up an idea that has been raised to me several times
in the context of algorithms engineering work in Apache Arrow. The
gist is that it can be challenging to do algorithms research inside
large production systems, especially for those with a certain domain
expertise (like SIMD vectorization) but perhaps are not well-versed in
the software "structures" around the algorithms that are being
studied.

Some folks in the HPC community in recent years have been creating
smaller "proxy applications", also called "MiniApps" [1], to make the
algorithms and performance research more accessible to developers who
are less familiar with the software structures and class hierarchies
that often surround the algorithms. It also allows the research work
to evolve without having to pass through the gauntlet of the
production CI systems and merge process. Eventually, research will get
copied into the production application or form the basis of a clean
implementation in production. Some Arrow contributors have already
been doing some "miniapp" type development, though in external
repositories (here's one relating to a discussion a couple of years
ago about threading programming models and libraries where we were
talking about Intel TBB: [2]).

In short I wanted to propose creating a separate git repository under
apache/arrow-* for this purpose, to invite these kinds of
contributions to our project and to help more R&D work happen inside
the Arrow umbrella so we have clean IP lineage. I can't imagine we
would ever make releases from this repository but it could serve as a
flexible place to put stuff (even in branches that are independent
from each other) that may or may not be ready to make its home in one
of our production repositories.

Thoughts welcome!

Thanks
Wes

[1]: https://www.osti.gov/servlets/purl/1461070
[2]: https://github.com/anton-malakhov/nyctaxi

Reply via email to