Hi, So we are starting on porting the earlier DAS specific functionality to C5. And with this, we are planning on not embedding the Spark server functionality to the primary binary itself, but rather run it separately as another script in the same distribution. So basically, when running the server in the standalone mode, from a centralized script, we will start Spark processes and then the main stream processor server. And in a clustered setup, we will start the Spark processes separately, and do the clustering that is native to it, which is currently by integrating with ZooKeeper.
So basically, for the minimum H/A setup, we would need two stream processing nodes and also ZK to build up the cluster, if we are using Spark also. So with C5, since we are not anyway not using Hazelcast, for other general coordination operations also we can use ZK, since it is already a requirement for Spark. And we have the added benefit of not getting the issues that comes with a peer-to-peer coordination library, such as split brain scenarios. Also, aligning with the above approach, we are considering of directly integrate to Solr in running in external to stream processor, rather than doing the indexing in the embedded mode. Now also in DAS, we have a separate indexing mode (profile), so rather than using that, we can use Solr directly. So one of the main reasons for using this would be, it has additional functionality to base Lucene, where it comes OOTB functionality with aggregates etc.. which at the moment, we don't have full functionality. So the suggestion is, Solr will also come as a separate profile (script) with the distribution, and this will be started up if the indexing scenarios are required for the stream processor, which we can automatically start it up or selectively start it. Also, Solr clustering is also done with ZK, which we will anyway have with the new Spark clustering approach we are using. So the aim of getting out the non-WSO2 specific servers without embedding is, the simplicity it provides in our codebase, since we do not have to maintain the integration code that is required to embed it, and those servers can use its own recommended deployment patterns. For example, Spark isn't designed to be embedded in to other servers, so we had to mess around with some things to embed and cluster it internally. And also, upgrading dependencies such as that becomes very straightforward, since it's external to the base binary. Cheers, Anjana. -- *Anjana Fernando* Associate Director / Architect WSO2 Inc. | http://wso2.com lean . enterprise . middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
