hi folks,

As Apache Arrow grows more popular, we may acquire some different
kinds of third party developers:

A. Developers who use and, in many cases, contribute to one of the
project's reference implementations

B. Developers who choose to implement the columnar format themselves,
without depending on any reference implementation

There's nothing we can do to stop Category B developers, and in some
cases building an bespoke implementation may be the correct move.

I'm concerned about the case of incomplete implementations that are
advertised as "using Arrow", "following the Arrow specification", or
"Arrow-compatible". An implementation is considered incomplete if it
does not pass the muster of our binary integration test suite (we will
eventually need to make this easier to run on third party libraries:
https://issues.apache.org/jira/browse/ARROW-6571).

If an implementation does not have integration tests to prove
compliance, then advertisements regarding its level of compatibility
or trueness to the specification may mislead users. Problems that
arise from these situations may result in harm to the Arrow
community's reputation through no fault of our own.

Since we can't force third parties to use any of the Arrow community's
code artifacts, one idea is to develop some form of "grading" system
to enable projects to self-report the nature of their use of the Arrow
columnar format to help answer such questions as:

* Do you use a fully integration-tested implementation (e.g. I am only
aware of 4 such libraries at the moment -- our reference libraries in
C++, Java, JavaScript, and Go -- I understand that C# and Rust will
get there eventually)?
* If your project "supports Arrow" does that mean just "can serialize
data to/from Arrow" or something more?
* Does your project feature some level of "native" Arrow-based processing?

A linear grading scale may not make sense, but having clear answers to
some of these questions in downstream projects' documentation would be
helpful.

As Apache Arrow's brand grows and value, more and more projects will
use the brand in a "Powered By" way, and so I think it's important
that we help projects clearly communicate to their users to what
extent they employ the project.

Thanks,
Wes

Reply via email to