On Fri, Oct 21, 2016 at 2:00 PM, Anjana Fernando <[email protected]> wrote:

> Hi,
>
> So we are starting on porting the earlier DAS specific functionality to
> C5. And with this, we are planning on not embedding the Spark server
> functionality to the primary binary itself, but rather run it separately as
> another script in the same distribution. So basically, when running the
> server in the standalone mode, from a centralized script, we will start
> Spark processes and then the main stream processor server. And in a
> clustered setup, we will start the Spark processes separately, and do the
> clustering that is native to it, which is currently by integrating with
> ZooKeeper.
>
> +1


> So basically, for the minimum H/A setup, we would need two stream
> processing nodes and also ZK to build up the cluster, if we are using Spark
> also. So with C5, since we are not anyway not using Hazelcast, for other
> general coordination operations also we can use ZK, since it is already a
> requirement for Spark. And we have the added benefit of not getting the
> issues that comes with a peer-to-peer coordination library, such as split
> brain scenarios.
>
>
Also, aligning with the above approach, we are considering of directly
> integrate to Solr in running in external to stream processor, rather than
> doing the indexing in the embedded mode. Now also in DAS, we have a
> separate indexing mode (profile), so rather than using that, we can use
> Solr directly. So one of the main reasons for using this would be, it has
> additional functionality to base Lucene, where it comes OOTB functionality
> with aggregates etc.. which at the moment, we don't have full
> functionality. So the suggestion is, Solr will also come as a separate
> profile (script) with the distribution, and this will be started up if the
> indexing scenarios are required for the stream processor, which we can
> automatically start it up or selectively start it. Also, Solr clustering is
> also done with ZK, which we will anyway have with the new Spark clustering
> approach we are using.
>
> So the aim of getting out the non-WSO2 specific servers without embedding
> is, the simplicity it provides in our codebase, since we do not have to
> maintain the integration code that is required to embed it, and those
> servers can use its own recommended deployment patterns. For example, Spark
> isn't designed to be embedded in to other servers, so we had to mess around
> with some things to embed and cluster it internally. And also, upgrading
> dependencies such as that becomes very straightforward, since it's external
> to the base binary.
>

+1 for having Spark, Solr & ZK as external to Stream Processor's core
capability, In minimum HA setup we can start all three in both the nodes,
and when scaling the deployment we can scale the components based on the
load on each of them.

I'm +1 for shiping all 3 as part of Product Analytics, but when it comes to
Stream Processor I believe shipping Spark & Solr will be over kill for the
straming solution. We can ship all the necessary connectors and we can ask
the users to download Spark & Solr when needed.

Regards
Suho

>
> Cheers,
> Anjana.
> --
> *Anjana Fernando*
> Associate Director / Architect
> WSO2 Inc. | http://wso2.com
> lean . enterprise . middleware
>



-- 

*S. Suhothayan*
Associate Director / Architect & Team Lead of WSO2 Complex Event Processor
*WSO2 Inc. *http://wso2.com
* <http://wso2.com/>*
lean . enterprise . middleware


*cell: (+94) 779 756 757 | blog: http://suhothayan.blogspot.com/
<http://suhothayan.blogspot.com/>twitter: http://twitter.com/suhothayan
<http://twitter.com/suhothayan> | linked-in:
http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to