Hi, At the moment DAS support both MyISAM and InnoDB, but configured to use MyISAM by default.
There are several differences between MYISAM and InnoDB, but what is most relevant with regard to DAS is the difference in concurrency. Basically, MyISAM uses table-level locking and InnoDB uses row-level locking. So, with MyISAM, if we are running Spark queries while publishing data to DAS, in higher TPS it can lead to issues due to the inability of obtaining the table lock by DAL layer to insert data to the table while Spark reading from the same table. However, on the other hand, with InnoDB write speed is considerably slow (because it is designed to support transactions), so it will affect the receiver performance. One option we have in DAS is, we can use two DBs to to keep incoming records and processed records, i.e., EVENT_STORE and PROCESSED_DATA_STORE. For ESB Analytics, we can configure to use MyISAM for EVENT_STORE and InnoDB for PROCESSED_DATA_STORE. It is because in ESB analytics, summarizing up to minute level is done by real time analytics and Spark queries will read and process data using minutely (and higher) tables which we can keep in PROCESSED_DATA_STORE. Since raw table(which data receiver writes data) is not being used by Spark queries, the receiver performance will not be affected. However, in most cases, Spark queries may written to read data directly from raw tables. As mentioned above, with MyISAM this could lead to performance issues if data publishing and spark analytics happens in parallel. So considering that I think we should change the default configuration to use InnoDB. WDYT? -- Thanks & Regards, Inosh Goonewardena Associate Technical Lead- WSO2 Inc. Mobile: +94779966317
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
