So, here is what we are going to do. We create three record stores, i.e., EVENT_STORE_WO (MyISAM), EVENT_STORE_WRO (InnoDB) and PROCESSED_STORE(InnoDB).
This can be done via configurations (<DAS_HOME>/repository/conf/analytics/analytics-config.xml). Both EVENT_STORE_WO and EVENT_STORE_WRO will use the same data source but tables will be created accordingly based on query configurations. In default DAS distribution, primary record store will be EVENT_STORE_WRO. But, in ESB analytics pack, primary record store in use will be EVENT_STORE_WO. Basically, based on the analytics scenario, event persistence artifacts (of streams) are configured with corresponding record store. So the benefit here is, we can install any analytics feature (or CAPP) in any DAS analytics distribution without worrying about the behavior of default data storing/retrieval model. On Thu, May 26, 2016 at 2:44 PM, Anjana Fernando <[email protected]> wrote: > Hi, > > So actually, we need to solve the case of, even though by default, we can > use the "write_read_optimized" mode of the record store (which will > automatically switch the queries used to create the database tables from > the templates), but for some cases, the default event store we use, we need > it to run in "write_optimized" mode, (where in MySQL, it uses MyISAM), for > example, in ESB analytics case, for the raw event storing, we can use this, > since there aren't many continuous reads done on it, like running a Spark > job on it (it's done by CEP now). So if someone installs ESB analytics > features to a base DAS distribution, as of now, it will be using the > "EVENT_STORE" record store, which is by default set to > "write_read_optimized" mode. > > So what I suggest is, creating two record stores to represent the current > single "EVENT_STORE" one, where we can say like, "EVENT_STORE_WO" and > "EVENT_STORE_WRO", which would represent "write_optimized" and > "write_read_optimized" backed configurations, ("PROCESSED_STORE" will > anyway be "write_read_optimized"). So in a MySQL setup, this would actually > come into affect, when creating database tables, in a setup like HBase, the > data source would possibly be pointing to a single database server, and > same type of tables will be created. So basically what we achieve at the > end is, we can write all our analytics scenarios in a portable way, without > worrying about the behavior of the data storing/retrieval, as long as, we > use the default record store names, which comes with a typical DAS, and > only data source level changes would be done when needed. > > P.S. Also can we rename "write_read_optimized" in the configurations to > "read_write_optimized", where the second one is more natural. > > Cheers, > Anjana. > > On Wed, May 25, 2016 at 8:10 PM, Inosh Goonewardena <[email protected]> > wrote: > >> Hi, >> >> At the moment DAS support both MyISAM and InnoDB, but configured to use >> MyISAM by default. >> >> There are several differences between MYISAM and InnoDB, but what is most >> relevant with regard to DAS is the difference in concurrency. Basically, >> MyISAM uses table-level locking and InnoDB uses row-level locking. So, with >> MyISAM, if we are running Spark queries while publishing data to DAS, in >> higher TPS it can lead to issues due to the inability of obtaining the >> table lock by DAL layer to insert data to the table while Spark reading >> from the same table. >> >> However, on the other hand, with InnoDB write speed is considerably slow >> (because it is designed to support transactions), so it will affect the >> receiver performance. >> >> One option we have in DAS is, we can use two DBs to to keep incoming >> records and processed records, i.e., EVENT_STORE and PROCESSED_DATA_STORE. >> >> For ESB Analytics, we can configure to use MyISAM for EVENT_STORE and >> InnoDB for PROCESSED_DATA_STORE. It is because in ESB analytics, >> summarizing up to minute level is done by real time analytics and Spark >> queries will read and process data using minutely (and higher) tables which >> we can keep in PROCESSED_DATA_STORE. Since raw table(which data receiver >> writes data) is not being used by Spark queries, the receiver performance >> will not be affected. >> >> However, in most cases, Spark queries may written to read data directly >> from raw tables. As mentioned above, with MyISAM this could lead to >> performance issues if data publishing and spark analytics happens in >> parallel. So considering that I think we should change the default >> configuration to use InnoDB. WDYT? >> >> -- >> Thanks & Regards, >> >> Inosh Goonewardena >> Associate Technical Lead- WSO2 Inc. >> Mobile: +94779966317 >> > > > > -- > *Anjana Fernando* > Senior Technical Lead > WSO2 Inc. | http://wso2.com > lean . enterprise . middleware > -- Thanks & Regards, Inosh Goonewardena Associate Technical Lead- WSO2 Inc. Mobile: +94779966317
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
