1. Are there particular issues that have cropped up that we should be aware of? This might help inform how we go about this. 2. We should be publishing a matrix of current compliance with the standard for our existing implementations (this could be the basis of letting bespoke implementations clarify what they support). 3. I'm not sure I understand the exact conclusion one should draw by answering the three questions that are posed above. People can be using the one of the core Arrow implementations and still be using it incorrectly which would cause bugs. Similarly, I'm not sure as an end-user what conclusion I should draw from "some level of" native arrow based processing?
Thanks, Micah On Mon, Sep 16, 2019 at 7:22 AM Wes McKinney <wesmck...@gmail.com> wrote: > hi folks, > > As Apache Arrow grows more popular, we may acquire some different > kinds of third party developers: > > A. Developers who use and, in many cases, contribute to one of the > project's reference implementations > > B. Developers who choose to implement the columnar format themselves, > without depending on any reference implementation > > There's nothing we can do to stop Category B developers, and in some > cases building an bespoke implementation may be the correct move. > > I'm concerned about the case of incomplete implementations that are > advertised as "using Arrow", "following the Arrow specification", or > "Arrow-compatible". An implementation is considered incomplete if it > does not pass the muster of our binary integration test suite (we will > eventually need to make this easier to run on third party libraries: > https://issues.apache.org/jira/browse/ARROW-6571). > > If an implementation does not have integration tests to prove > compliance, then advertisements regarding its level of compatibility > or trueness to the specification may mislead users. Problems that > arise from these situations may result in harm to the Arrow > community's reputation through no fault of our own. > > Since we can't force third parties to use any of the Arrow community's > code artifacts, one idea is to develop some form of "grading" system > to enable projects to self-report the nature of their use of the Arrow > columnar format to help answer such questions as: > > * Do you use a fully integration-tested implementation (e.g. I am only > aware of 4 such libraries at the moment -- our reference libraries in > C++, Java, JavaScript, and Go -- I understand that C# and Rust will > get there eventually)? > * If your project "supports Arrow" does that mean just "can serialize > data to/from Arrow" or something more? > * Does your project feature some level of "native" Arrow-based processing? > > A linear grading scale may not make sense, but having clear answers to > some of these questions in downstream projects' documentation would be > helpful. > > As Apache Arrow's brand grows and value, more and more projects will > use the brand in a "Powered By" way, and so I think it's important > that we help projects clearly communicate to their users to what > extent they employ the project. > > Thanks, > Wes >