On Thu, May 26, 2016 at 1:16 PM, Srinath Perera <[email protected]> wrote:

> Please also update the Docs to reflect this.
>

Noted.


>
> --Srinath
>
> On Thu, May 26, 2016 at 12:29 PM, Inosh Goonewardena <[email protected]>
> wrote:
>
>> Hi Srinath,
>>
>> On Thu, May 26, 2016 at 12:09 PM, Srinath Perera <[email protected]>
>> wrote:
>>
>>> Hi Inosh,
>>>
>>> Good catch!! I am +1. Can we do this just by configs or do we need a
>>> patch? If so can we patch before we release?
>>>
>>
>> We can do this by configuration change.
>>
>>
>>>
>>> Anjana, cannot we use HDFS for EVENT_STORE and used MySQL only for
>>> processed data store? ( long term)
>>>
>>
>> We can. This is the best approach to use without affecting receiver
>> performance while running spark jobs in parallel.
>>
>>
>>>
>>> --Srinath
>>>
>>> On Wed, May 25, 2016 at 8:10 PM, Inosh Goonewardena <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> At the moment DAS support both MyISAM and InnoDB, but configured to use
>>>> MyISAM by default.
>>>>
>>>> There are several differences between MYISAM and InnoDB, but what is
>>>> most relevant with regard to DAS is the difference in concurrency.
>>>> Basically, MyISAM uses table-level locking and InnoDB uses row-level
>>>> locking. So, with MyISAM, if we are running Spark queries while publishing
>>>> data to DAS, in higher TPS it can lead to issues due to the inability of
>>>> obtaining the table lock by DAL layer to insert data to the table while
>>>> Spark reading from the same table.
>>>>
>>>> However, on the other hand, with InnoDB write speed is considerably
>>>> slow (because it is designed to support transactions), so it will affect
>>>> the receiver performance.
>>>>
>>>> One option we have in DAS is, we can use two DBs to to keep incoming
>>>> records and processed records, i.e., EVENT_STORE and PROCESSED_DATA_STORE.
>>>>
>>>> For ESB Analytics, we can configure to use MyISAM for EVENT_STORE and
>>>> InnoDB for PROCESSED_DATA_STORE. It is because in ESB analytics,
>>>> summarizing up to minute level is done by real time analytics and Spark
>>>> queries will read and process data using minutely (and higher) tables which
>>>> we can keep in PROCESSED_DATA_STORE. Since raw table(which data receiver
>>>> writes data) is not being used by Spark queries, the receiver performance
>>>> will not be affected.
>>>>
>>>> However, in most cases, Spark queries may written to read data directly
>>>> from raw tables. As mentioned above, with MyISAM this could lead to
>>>> performance issues if data publishing and spark analytics happens in
>>>> parallel. So considering that I think we should change the default
>>>> configuration to use InnoDB. WDYT?
>>>>
>>>> --
>>>> Thanks & Regards,
>>>>
>>>> Inosh Goonewardena
>>>> Associate Technical Lead- WSO2 Inc.
>>>> Mobile: +94779966317
>>>>
>>>
>>>
>>>
>>> --
>>> ============================
>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>> Site: http://home.apache.org/~hemapani/
>>> Photos: http://www.flickr.com/photos/hemapani/
>>> Phone: 0772360902
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Inosh Goonewardena
>> Associate Technical Lead- WSO2 Inc.
>> Mobile: +94779966317
>>
>
>
>
> --
> ============================
> Srinath Perera, Ph.D.
>    http://people.apache.org/~hemapani/
>    http://srinathsview.blogspot.com/
>



-- 
Thanks & Regards,

Inosh Goonewardena
Associate Technical Lead- WSO2 Inc.
Mobile: +94779966317
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to