On Sat, Oct 22, 2016 at 10:45 AM, Nirmal Fernando <[email protected]> wrote:
> > > On Fri, Oct 21, 2016 at 2:00 PM, Anjana Fernando <[email protected]> wrote: > >> Hi, >> >> So we are starting on porting the earlier DAS specific functionality to >> C5. And with this, we are planning on not embedding the Spark server >> functionality to the primary binary itself, but rather run it separately as >> another script in the same distribution. So basically, when running the >> server in the standalone mode, from a centralized script, we will start >> Spark processes and then the main stream processor server. And in a >> clustered setup, we will start the Spark processes separately, and do the >> clustering that is native to it, which is currently by integrating with >> ZooKeeper. >> > > Does this mean we still keep Spark binaries inside Stream Processor? If > not how are we planning to start a Spark process from Stream Processor? > We don't need to have Spark binaries in Stream Processor and I believe its wrong as its not the core functionality of that. But when it comes to Product Analytics we may ship that. We need to decide on that. >> So basically, for the minimum H/A setup, we would need two stream >> processing nodes and also ZK to build up the cluster, if we are using Spark >> also. So with C5, since we are not anyway not using Hazelcast, for other >> general coordination operations also we can use ZK, since it is already a >> requirement for Spark. And we have the added benefit of not getting the >> issues that comes with a peer-to-peer coordination library, such as split >> brain scenarios. >> >> Also, aligning with the above approach, we are considering of directly >> integrate to Solr in running in external to stream processor, rather than >> doing the indexing in the embedded mode. Now also in DAS, we have a >> separate indexing mode (profile), so rather than using that, we can use >> Solr directly. So one of the main reasons for using this would be, it has >> additional functionality to base Lucene, where it comes OOTB functionality >> with aggregates etc.. which at the moment, we don't have full >> functionality. So the suggestion is, Solr will also come as a separate >> profile (script) with the distribution, and this will be started up if the >> indexing scenarios are required for the stream processor, which we can >> automatically start it up or selectively start it. Also, Solr clustering is >> also done with ZK, which we will anyway have with the new Spark clustering >> approach we are using. >> >> So the aim of getting out the non-WSO2 specific servers without embedding >> is, the simplicity it provides in our codebase, since we do not have to >> maintain the integration code that is required to embed it, and those >> servers can use its own recommended deployment patterns. For example, Spark >> isn't designed to be embedded in to other servers, so we had to mess around >> with some things to embed and cluster it internally. And also, upgrading >> dependencies such as that becomes very straightforward, since it's external >> to the base binary. >> >> Cheers, >> Anjana. >> -- >> *Anjana Fernando* >> Associate Director / Architect >> WSO2 Inc. | http://wso2.com >> lean . enterprise . middleware >> > > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > -- *S. Suhothayan* Associate Director / Architect & Team Lead of WSO2 Complex Event Processor *WSO2 Inc. *http://wso2.com * <http://wso2.com/>* lean . enterprise . middleware *cell: (+94) 779 756 757 | blog: http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter: http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
