On Fri, Oct 21, 2016 at 2:00 PM, Anjana Fernando <[email protected]> wrote:
> Hi, > > So we are starting on porting the earlier DAS specific functionality to > C5. And with this, we are planning on not embedding the Spark server > functionality to the primary binary itself, but rather run it separately as > another script in the same distribution. So basically, when running the > server in the standalone mode, from a centralized script, we will start > Spark processes and then the main stream processor server. And in a > clustered setup, we will start the Spark processes separately, and do the > clustering that is native to it, which is currently by integrating with > ZooKeeper. > Does this mean we still keep Spark binaries inside Stream Processor? If not how are we planning to start a Spark process from Stream Processor? > > So basically, for the minimum H/A setup, we would need two stream > processing nodes and also ZK to build up the cluster, if we are using Spark > also. So with C5, since we are not anyway not using Hazelcast, for other > general coordination operations also we can use ZK, since it is already a > requirement for Spark. And we have the added benefit of not getting the > issues that comes with a peer-to-peer coordination library, such as split > brain scenarios. > > Also, aligning with the above approach, we are considering of directly > integrate to Solr in running in external to stream processor, rather than > doing the indexing in the embedded mode. Now also in DAS, we have a > separate indexing mode (profile), so rather than using that, we can use > Solr directly. So one of the main reasons for using this would be, it has > additional functionality to base Lucene, where it comes OOTB functionality > with aggregates etc.. which at the moment, we don't have full > functionality. So the suggestion is, Solr will also come as a separate > profile (script) with the distribution, and this will be started up if the > indexing scenarios are required for the stream processor, which we can > automatically start it up or selectively start it. Also, Solr clustering is > also done with ZK, which we will anyway have with the new Spark clustering > approach we are using. > > So the aim of getting out the non-WSO2 specific servers without embedding > is, the simplicity it provides in our codebase, since we do not have to > maintain the integration code that is required to embed it, and those > servers can use its own recommended deployment patterns. For example, Spark > isn't designed to be embedded in to other servers, so we had to mess around > with some things to embed and cluster it internally. And also, upgrading > dependencies such as that becomes very straightforward, since it's external > to the base binary. > > Cheers, > Anjana. > -- > *Anjana Fernando* > Associate Director / Architect > WSO2 Inc. | http://wso2.com > lean . enterprise . middleware > -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
