On Thu, May 26, 2016 at 1:16 PM, Srinath Perera <[email protected]> wrote:
> Please also update the Docs to reflect this. > Noted. > > --Srinath > > On Thu, May 26, 2016 at 12:29 PM, Inosh Goonewardena <[email protected]> > wrote: > >> Hi Srinath, >> >> On Thu, May 26, 2016 at 12:09 PM, Srinath Perera <[email protected]> >> wrote: >> >>> Hi Inosh, >>> >>> Good catch!! I am +1. Can we do this just by configs or do we need a >>> patch? If so can we patch before we release? >>> >> >> We can do this by configuration change. >> >> >>> >>> Anjana, cannot we use HDFS for EVENT_STORE and used MySQL only for >>> processed data store? ( long term) >>> >> >> We can. This is the best approach to use without affecting receiver >> performance while running spark jobs in parallel. >> >> >>> >>> --Srinath >>> >>> On Wed, May 25, 2016 at 8:10 PM, Inosh Goonewardena <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> At the moment DAS support both MyISAM and InnoDB, but configured to use >>>> MyISAM by default. >>>> >>>> There are several differences between MYISAM and InnoDB, but what is >>>> most relevant with regard to DAS is the difference in concurrency. >>>> Basically, MyISAM uses table-level locking and InnoDB uses row-level >>>> locking. So, with MyISAM, if we are running Spark queries while publishing >>>> data to DAS, in higher TPS it can lead to issues due to the inability of >>>> obtaining the table lock by DAL layer to insert data to the table while >>>> Spark reading from the same table. >>>> >>>> However, on the other hand, with InnoDB write speed is considerably >>>> slow (because it is designed to support transactions), so it will affect >>>> the receiver performance. >>>> >>>> One option we have in DAS is, we can use two DBs to to keep incoming >>>> records and processed records, i.e., EVENT_STORE and PROCESSED_DATA_STORE. >>>> >>>> For ESB Analytics, we can configure to use MyISAM for EVENT_STORE and >>>> InnoDB for PROCESSED_DATA_STORE. It is because in ESB analytics, >>>> summarizing up to minute level is done by real time analytics and Spark >>>> queries will read and process data using minutely (and higher) tables which >>>> we can keep in PROCESSED_DATA_STORE. Since raw table(which data receiver >>>> writes data) is not being used by Spark queries, the receiver performance >>>> will not be affected. >>>> >>>> However, in most cases, Spark queries may written to read data directly >>>> from raw tables. As mentioned above, with MyISAM this could lead to >>>> performance issues if data publishing and spark analytics happens in >>>> parallel. So considering that I think we should change the default >>>> configuration to use InnoDB. WDYT? >>>> >>>> -- >>>> Thanks & Regards, >>>> >>>> Inosh Goonewardena >>>> Associate Technical Lead- WSO2 Inc. >>>> Mobile: +94779966317 >>>> >>> >>> >>> >>> -- >>> ============================ >>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>> Site: http://home.apache.org/~hemapani/ >>> Photos: http://www.flickr.com/photos/hemapani/ >>> Phone: 0772360902 >>> >> >> >> >> -- >> Thanks & Regards, >> >> Inosh Goonewardena >> Associate Technical Lead- WSO2 Inc. >> Mobile: +94779966317 >> > > > > -- > ============================ > Srinath Perera, Ph.D. > http://people.apache.org/~hemapani/ > http://srinathsview.blogspot.com/ > -- Thanks & Regards, Inosh Goonewardena Associate Technical Lead- WSO2 Inc. Mobile: +94779966317
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
