Hi, I've finished the initial implementation of $subject. This basically contains the standard interfaces we use to plug-in different data sources as the back-end record storage, and for indexing purposes. These pluggable data sources are called "Analytics Data Sources" here, where from a configuration file, you can give the implementation class and the properties required for the initialization. The first implementation of this is done, which is the RDBMS implementation. It basically stores all the records and other data in a relational database, and any type of database can be supported via a configuration file, which gives the query templates used to define a standard set of actions. At the moment, H2 and MySQL query templates have been tested, and we will be adding the rest of popular RDBMS templates as well. The RDBMS AnalyticsDataSource implementation detects the query template by looking at the database connection information, retrieved from the data source (e.g. mentioned in master-datasources.xml), and automatically switches to that mode, so the user basically doesn't have to do anything when configuring.
Also, inside the AnalyticsDataSource interface, there is a FileSystem interface you need to implement for your data source implementation, which is basically used for indexing, which is done by Lucene. We use Lucene indexes as index shards for a distributed index and search. So with the sharding approach, we can add more nodes to our cluster to improve the indexing performance, and for storage addition. Basically, provided the backend storage is scalable, the index operations also would be scalable in the same manner. But the limit we first hit is the processing requirements, and the random data access and locking requirements for each shard, so for a typical database system, just by adding new BAM nodes, I'm hoping the indexing performance will almost increase linearly. The AnalyicsDataSource implementations are finally used by a component called AnalyticsDataService, which is the interface seen by clients, and has the indexing related operations with the record store functionality exposed through AnalyticsDataSource. This interface can be looked up as an OSGi service, and we plan on also exposing these functionality as a JAX-RS service. The general design, and documentation on the test cases can be found here at [1] and [2], and the source code at [3]. I will be doing some further performance tests, by integrating this to the product properly, specially the distributed search, and will provide the results here. For the moment, we have a few performance tests as unit tests in the modules. This implementation will be first used by the log analysis implementation done by Gimantha. And we are planning on writing further AnalyticsDataSource implementations for this, such as MongoDB, HBase etc.. There will be separate notes on those. [1] https://docs.google.com/a/wso2.com/spreadsheets/d/10mHRE6FEgF6wDZ-LSBx18zL8ZcIay5ZIhb8MIk7pfeg/edit#gid=0 [2] https://docs.google.com/a/wso2.com/spreadsheets/d/1iXoZ8BzaefN3EGOL05y5aUX6SLZH7Bu8YM4bF3xOSvQ/edit#gid=0 [3] https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics Cheers, Anjana. -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
