Patrick Wendell created SPARK-2709:
--------------------------------------
Summary: Add a tool for certifying Spark API compatiblity
Key: SPARK-2709
URL: https://issues.apache.org/jira/browse/SPARK-2709
Project: Spark
Issue Type: New Feature
Components: Spark Core
Reporter: Patrick Wendell
Assignee: Prashant Sharma
As Spark is packaged by more and more distributors, it would be good to have a
tool that verifies API compatiblity of a provided Spark package. The tool would
certify that a vendor distrubtion of Spark contains all of the API's present in
a particular upstream Spark version.
This will help vendors make sure they remain "API compliant" when they make
changes or back ports to Spark. It will also discourage vendors from knowingly
breaking API's, because anyone can audit their distribution and see that they
have removed support for certain API's.
I'm hoping a tool like this will avoid API fragmentation in the Spark community.
One "poor man's" implementation of this is that a vendor can just run the
binary compatibility checks in the spark build against an upstream version of
Spark. That's a pretty good start, but it means you can't come as a third party
and audit a distribution.
Another approach would be to have something where anyone can come in and audit
a distribution even if they don't have access to the packaging and source code.
That would look something like this:
1. For each release we publish a manifest of all public API's (we might borrow
the MIMA string representation of bye code signatures)
2. We package an auditing tool as a jar file.
3. The user runs a tool with spark-submit that reflectively walks through all
exposed Spark API's and makes sure that everything on the manifest is
encountered.
>From the implementation side, this is just brainstorming at this point.
--
This message was sent by Atlassian JIRA
(v6.2#6252)