Re: [Architecture] [C5] Spark/Lucene Integration in Stream Processor

Nirmal Fernando Fri, 21 Oct 2016 22:16:57 -0700

On Fri, Oct 21, 2016 at 2:00 PM, Anjana Fernando <[email protected]> wrote:


> Hi,
>
> So we are starting on porting the earlier DAS specific functionality to
> C5. And with this, we are planning on not embedding the Spark server
> functionality to the primary binary itself, but rather run it separately as
> another script in the same distribution. So basically, when running the
> server in the standalone mode, from a centralized script, we will start
> Spark processes and then the main stream processor server. And in a
> clustered setup, we will start the Spark processes separately, and do the
> clustering that is native to it, which is currently by integrating with
> ZooKeeper.
>

Does this mean we still keep Spark binaries inside Stream Processor? If not
how are we planning to start a Spark process from Stream Processor?

>
> So basically, for the minimum H/A setup, we would need two stream
> processing nodes and also ZK to build up the cluster, if we are using Spark
> also. So with C5, since we are not anyway not using Hazelcast, for other
> general coordination operations also we can use ZK, since it is already a
> requirement for Spark. And we have the added benefit of not getting the
> issues that comes with a peer-to-peer coordination library, such as split
> brain scenarios.
>
> Also, aligning with the above approach, we are considering of directly
> integrate to Solr in running in external to stream processor, rather than
> doing the indexing in the embedded mode. Now also in DAS, we have a
> separate indexing mode (profile), so rather than using that, we can use
> Solr directly. So one of the main reasons for using this would be, it has
> additional functionality to base Lucene, where it comes OOTB functionality
> with aggregates etc.. which at the moment, we don't have full
> functionality. So the suggestion is, Solr will also come as a separate
> profile (script) with the distribution, and this will be started up if the
> indexing scenarios are required for the stream processor, which we can
> automatically start it up or selectively start it. Also, Solr clustering is
> also done with ZK, which we will anyway have with the new Spark clustering
> approach we are using.
>
> So the aim of getting out the non-WSO2 specific servers without embedding
> is, the simplicity it provides in our codebase, since we do not have to
> maintain the integration code that is required to embed it, and those
> servers can use its own recommended deployment patterns. For example, Spark
> isn't designed to be embedded in to other servers, so we had to mess around
> with some things to embed and cluster it internally. And also, upgrading
> dependencies such as that becomes very straightforward, since it's external
> to the base binary.
>
> Cheers,
> Anjana.
> --
> *Anjana Fernando*
> Associate Director / Architect
> WSO2 Inc. | http://wso2.com
> lean . enterprise . middleware
>



-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [C5] Spark/Lucene Integration in Stream Processor

Reply via email to