Re: [Architecture] [C5] Spark/Lucene Integration in Stream Processor

Sriskandarajah Suhothayan Sat, 22 Oct 2016 11:49:48 -0700

On Sat, Oct 22, 2016 at 10:45 AM, Nirmal Fernando <[email protected]> wrote:


>
>
> On Fri, Oct 21, 2016 at 2:00 PM, Anjana Fernando <[email protected]> wrote:
>
>> Hi,
>>
>> So we are starting on porting the earlier DAS specific functionality to
>> C5. And with this, we are planning on not embedding the Spark server
>> functionality to the primary binary itself, but rather run it separately as
>> another script in the same distribution. So basically, when running the
>> server in the standalone mode, from a centralized script, we will start
>> Spark processes and then the main stream processor server. And in a
>> clustered setup, we will start the Spark processes separately, and do the
>> clustering that is native to it, which is currently by integrating with
>> ZooKeeper.
>>
>
> Does this mean we still keep Spark binaries inside Stream Processor? If
> not how are we planning to start a Spark process from Stream Processor?
>

We don't need to have Spark binaries in Stream Processor and I believe its
wrong as its not the core functionality of that. But when it comes to
Product Analytics we may ship that. We need to decide on that.


>> So basically, for the minimum H/A setup, we would need two stream
>> processing nodes and also ZK to build up the cluster, if we are using Spark
>> also. So with C5, since we are not anyway not using Hazelcast, for other
>> general coordination operations also we can use ZK, since it is already a
>> requirement for Spark. And we have the added benefit of not getting the
>> issues that comes with a peer-to-peer coordination library, such as split
>> brain scenarios.
>>
>> Also, aligning with the above approach, we are considering of directly
>> integrate to Solr in running in external to stream processor, rather than
>> doing the indexing in the embedded mode. Now also in DAS, we have a
>> separate indexing mode (profile), so rather than using that, we can use
>> Solr directly. So one of the main reasons for using this would be, it has
>> additional functionality to base Lucene, where it comes OOTB functionality
>> with aggregates etc.. which at the moment, we don't have full
>> functionality. So the suggestion is, Solr will also come as a separate
>> profile (script) with the distribution, and this will be started up if the
>> indexing scenarios are required for the stream processor, which we can
>> automatically start it up or selectively start it. Also, Solr clustering is
>> also done with ZK, which we will anyway have with the new Spark clustering
>> approach we are using.
>>
>> So the aim of getting out the non-WSO2 specific servers without embedding
>> is, the simplicity it provides in our codebase, since we do not have to
>> maintain the integration code that is required to embed it, and those
>> servers can use its own recommended deployment patterns. For example, Spark
>> isn't designed to be embedded in to other servers, so we had to mess around
>> with some things to embed and cluster it internally. And also, upgrading
>> dependencies such as that becomes very straightforward, since it's external
>> to the base binary.
>>
>> Cheers,
>> Anjana.
>> --
>> *Anjana Fernando*
>> Associate Director / Architect
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Team Lead - WSO2 Machine Learner
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>


-- 

*S. Suhothayan*
Associate Director / Architect & Team Lead of WSO2 Complex Event Processor
*WSO2 Inc. *http://wso2.com
* <http://wso2.com/>*
lean . enterprise . middleware


*cell: (+94) 779 756 757 | blog: http://suhothayan.blogspot.com/
<http://suhothayan.blogspot.com/>twitter: http://twitter.com/suhothayan
<http://twitter.com/suhothayan> | linked-in:
http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [C5] Spark/Lucene Integration in Stream Processor

Reply via email to