Please also update the Docs to reflect this.

--Srinath

On Thu, May 26, 2016 at 12:29 PM, Inosh Goonewardena <[email protected]> wrote:

> Hi Srinath,
>
> On Thu, May 26, 2016 at 12:09 PM, Srinath Perera <[email protected]> wrote:
>
>> Hi Inosh,
>>
>> Good catch!! I am +1. Can we do this just by configs or do we need a
>> patch? If so can we patch before we release?
>>
>
> We can do this by configuration change.
>
>
>>
>> Anjana, cannot we use HDFS for EVENT_STORE and used MySQL only for
>> processed data store? ( long term)
>>
>
> We can. This is the best approach to use without affecting receiver
> performance while running spark jobs in parallel.
>
>
>>
>> --Srinath
>>
>> On Wed, May 25, 2016 at 8:10 PM, Inosh Goonewardena <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> At the moment DAS support both MyISAM and InnoDB, but configured to use
>>> MyISAM by default.
>>>
>>> There are several differences between MYISAM and InnoDB, but what is
>>> most relevant with regard to DAS is the difference in concurrency.
>>> Basically, MyISAM uses table-level locking and InnoDB uses row-level
>>> locking. So, with MyISAM, if we are running Spark queries while publishing
>>> data to DAS, in higher TPS it can lead to issues due to the inability of
>>> obtaining the table lock by DAL layer to insert data to the table while
>>> Spark reading from the same table.
>>>
>>> However, on the other hand, with InnoDB write speed is considerably slow
>>> (because it is designed to support transactions), so it will affect the
>>> receiver performance.
>>>
>>> One option we have in DAS is, we can use two DBs to to keep incoming
>>> records and processed records, i.e., EVENT_STORE and PROCESSED_DATA_STORE.
>>>
>>> For ESB Analytics, we can configure to use MyISAM for EVENT_STORE and
>>> InnoDB for PROCESSED_DATA_STORE. It is because in ESB analytics,
>>> summarizing up to minute level is done by real time analytics and Spark
>>> queries will read and process data using minutely (and higher) tables which
>>> we can keep in PROCESSED_DATA_STORE. Since raw table(which data receiver
>>> writes data) is not being used by Spark queries, the receiver performance
>>> will not be affected.
>>>
>>> However, in most cases, Spark queries may written to read data directly
>>> from raw tables. As mentioned above, with MyISAM this could lead to
>>> performance issues if data publishing and spark analytics happens in
>>> parallel. So considering that I think we should change the default
>>> configuration to use InnoDB. WDYT?
>>>
>>> --
>>> Thanks & Regards,
>>>
>>> Inosh Goonewardena
>>> Associate Technical Lead- WSO2 Inc.
>>> Mobile: +94779966317
>>>
>>
>>
>>
>> --
>> ============================
>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>> Site: http://home.apache.org/~hemapani/
>> Photos: http://www.flickr.com/photos/hemapani/
>> Phone: 0772360902
>>
>
>
>
> --
> Thanks & Regards,
>
> Inosh Goonewardena
> Associate Technical Lead- WSO2 Inc.
> Mobile: +94779966317
>



-- 
============================
Srinath Perera, Ph.D.
   http://people.apache.org/~hemapani/
   http://srinathsview.blogspot.com/
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to