Search

Anjana Fernando Wed, 10 Dec 2014 00:08:45 -0800

Hi,

I've finished the initial implementation of $subject. This basically
contains the standard interfaces we use to plug-in different data sources
as the back-end record storage, and for indexing purposes. These pluggable
data sources are called "Analytics Data Sources" here, where from a
configuration file, you can give the implementation class and the
properties required for the initialization. The first implementation of
this is done, which is the RDBMS implementation. It basically stores all
the records and other data in a relational database, and any type of
database can be supported via a configuration file, which gives the query
templates used to define a standard set of actions. At the moment, H2 and
MySQL query templates have been tested, and we will be adding the rest of
popular RDBMS templates as well. The RDBMS AnalyticsDataSource
implementation detects the query template by looking at the database
connection information, retrieved from the data source (e.g. mentioned in
master-datasources.xml), and automatically switches to that mode, so the
user basically doesn't have to do anything when configuring.

Also, inside the AnalyticsDataSource interface, there is a FileSystem
interface you need to implement for your data source implementation, which
is basically used for indexing, which is done by Lucene. We use Lucene
indexes as index shards for a distributed index and search. So with the
sharding approach, we can add more nodes to our cluster to improve the
indexing performance, and for storage addition. Basically, provided the
backend storage is scalable, the index operations also would be scalable in
the same manner. But the limit we first hit is the processing requirements,
and the random data access and locking requirements for each shard, so for
a typical database system, just by adding new BAM nodes, I'm hoping the
indexing performance will almost increase linearly.

The AnalyicsDataSource implementations are finally used by a component
called AnalyticsDataService, which is the interface seen by clients, and
has the indexing related operations with the record store functionality
exposed through AnalyticsDataSource. This interface can be looked up as an
OSGi service, and we plan on also exposing these functionality as a JAX-RS
service.

The general design, and documentation on the test cases can be found here
at [1] and [2], and the source code at [3]. I will be doing some further
performance tests, by integrating this to the product properly, specially
the distributed search, and will provide the results here. For the moment,
we have a few performance tests as unit tests in the modules. This
implementation will be first used by the log analysis implementation done
by Gimantha. And we are planning on writing further AnalyticsDataSource
implementations for this, such as MongoDB, HBase etc.. There will be
separate notes on those.

[1]
https://docs.google.com/a/wso2.com/spreadsheets/d/10mHRE6FEgF6wDZ-LSBx18zL8ZcIay5ZIhb8MIk7pfeg/edit#gid=0
[2]
https://docs.google.com/a/wso2.com/spreadsheets/d/1iXoZ8BzaefN3EGOL05y5aUX6SLZH7Bu8YM4bF3xOSvQ/edit#gid=0
[3]
https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

Cheers,
Anjana.
--
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

[Architecture] BAM 3.0 Data Layer Implementation / RDBMS / Distributed Indexing / Search

Reply via email to