Please also update the Docs to reflect this. --Srinath
On Thu, May 26, 2016 at 12:29 PM, Inosh Goonewardena <[email protected]> wrote: > Hi Srinath, > > On Thu, May 26, 2016 at 12:09 PM, Srinath Perera <[email protected]> wrote: > >> Hi Inosh, >> >> Good catch!! I am +1. Can we do this just by configs or do we need a >> patch? If so can we patch before we release? >> > > We can do this by configuration change. > > >> >> Anjana, cannot we use HDFS for EVENT_STORE and used MySQL only for >> processed data store? ( long term) >> > > We can. This is the best approach to use without affecting receiver > performance while running spark jobs in parallel. > > >> >> --Srinath >> >> On Wed, May 25, 2016 at 8:10 PM, Inosh Goonewardena <[email protected]> >> wrote: >> >>> Hi, >>> >>> At the moment DAS support both MyISAM and InnoDB, but configured to use >>> MyISAM by default. >>> >>> There are several differences between MYISAM and InnoDB, but what is >>> most relevant with regard to DAS is the difference in concurrency. >>> Basically, MyISAM uses table-level locking and InnoDB uses row-level >>> locking. So, with MyISAM, if we are running Spark queries while publishing >>> data to DAS, in higher TPS it can lead to issues due to the inability of >>> obtaining the table lock by DAL layer to insert data to the table while >>> Spark reading from the same table. >>> >>> However, on the other hand, with InnoDB write speed is considerably slow >>> (because it is designed to support transactions), so it will affect the >>> receiver performance. >>> >>> One option we have in DAS is, we can use two DBs to to keep incoming >>> records and processed records, i.e., EVENT_STORE and PROCESSED_DATA_STORE. >>> >>> For ESB Analytics, we can configure to use MyISAM for EVENT_STORE and >>> InnoDB for PROCESSED_DATA_STORE. It is because in ESB analytics, >>> summarizing up to minute level is done by real time analytics and Spark >>> queries will read and process data using minutely (and higher) tables which >>> we can keep in PROCESSED_DATA_STORE. Since raw table(which data receiver >>> writes data) is not being used by Spark queries, the receiver performance >>> will not be affected. >>> >>> However, in most cases, Spark queries may written to read data directly >>> from raw tables. As mentioned above, with MyISAM this could lead to >>> performance issues if data publishing and spark analytics happens in >>> parallel. So considering that I think we should change the default >>> configuration to use InnoDB. WDYT? >>> >>> -- >>> Thanks & Regards, >>> >>> Inosh Goonewardena >>> Associate Technical Lead- WSO2 Inc. >>> Mobile: +94779966317 >>> >> >> >> >> -- >> ============================ >> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >> Site: http://home.apache.org/~hemapani/ >> Photos: http://www.flickr.com/photos/hemapani/ >> Phone: 0772360902 >> > > > > -- > Thanks & Regards, > > Inosh Goonewardena > Associate Technical Lead- WSO2 Inc. > Mobile: +94779966317 > -- ============================ Srinath Perera, Ph.D. http://people.apache.org/~hemapani/ http://srinathsview.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
