[PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

Amit Sela Wed, 03 Aug 2016 23:39:55 -0700

After discussions with JB, and understanding that a lot of companies
running Spark will probably run 1.6.x for a while, we thought it would be a
good idea to have (some) support for both branches.


The SparkRunnerV1 will mostly support Batch, but could also support
“KeyedState” workflows and Sessions. As for streaming, I suggest to
eliminate the awkward
<https://github.com/apache/incubator-beam/tree/master/runners/spark#streaming>
way it uses Beam Windows, and only support Processing-Time windows.

The SparkRunnerV2 will have a batch/streaming support relying on Structured
Streaming and the functionality it provides, and will provide in the
future, to support the Beam model best as it can.

The runners will exist under “runners/spark/spark1” and
“runners/spark/spark2”.

If this proposal is accepted, I will change JIRA tickets according to a
proposed roadmap for both runners.

General roadmap:


SparkRunnerV1 should mostly “cleanup” and get rid of the Window-mocking,
while specifically declaring Unsupported where it should.

Additional features:

   1.

   Read.Bound support - actually supported in the SparkRunnerV2 branch that
   is at work and it already passed some tests by JB and Ismael from Talend.
   I’ve also asked Michael Armbrust from Apache Spark to review this, and once
   it’s all set I’ll backport it to V1 as well.
   2.

   Consider support for “Keyed-State”.
   3.

   Consider support for “Sessions”


SparkRunnerV2 branch <https://github.com/apache/incubator-beam/pull/495> is
at work right now and I hope to have it out supporting (some) event-time
windowing, triggers and accumulation modes for streaming.

Thanks,
Amit

[PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

Reply via email to