Re: [Architecture] [Update] Incremental Processing for BAM

Srinath Perera Mon, 10 Feb 2014 06:31:07 -0800

Hi Sinthuja,

Do we now support processing data for a given time period? e.g. last 24
hours. I am -1 on doing anything more on this for now (e.g. incr_count).


IMHO, there are more high priority items in the roadmap, please chat with
Anjana.

--Srinath


On Mon, Feb 10, 2014 at 2:42 PM, Sinthuja Ragendran <[email protected]>wrote:

> Hi all,
>
> I was working on providing the incremental processing support for BAM.
> This feature was implemented in BAM 2.4.0 [1], but it wasn't well tested in
> fully distributed mode during the BAM 2.4.0 release, hence it was marked as
> experimental feature. You can find the implementation details of this
> feature from [2].
>
> During the last week I was involved with testing this feature with 3 node
> hadoop cluster and Cassandra cluster. And all use cases mentioned in [1]
> was able to run in the external cluster without any issues.
>
> And also I have implemented incremental average operation (incr_avg())
> which incrementally calculates the final value based on the last hive query
> execution and the current execution. The intermediate necessary values for
> incremental operation, are store in cassandra and used for the current hive
> query execution. And once the current hive query execution is completed,
> the intermediate results will be again replaced with the new values. I have
> tested this incr_avg() operation also with fully distributed setup and it
> works well. As the first step, I have only implemented the incr_avg() to
> make sure whether the adopted approach will succeed on fully distributed
> setup.
>
> The below is the hive script sample which uses the incr_avg() function:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *CREATE EXTERNAL TABLE IF NOT EXISTS PhoneSalesTable   (orderID STRING,
> brandName STRING, userName STRING, quantity INT,     version STRING) STORED
> BY   'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'   WITH
> SERDEPROPERTIES (     "wso2.carbon.datasource.name
> <http://wso2.carbon.datasource.name>" = "WSO2BAM_CASSANDRA_DATASOURCE",
> "cassandra.cf.name <http://cassandra.cf.name>" =
> "org_wso2_bam_phone_retail_store_kpi" ,   "cassandra.columns.mapping" =
> ":key,payload_brand, payload_user, payload_quantity, Version" );
> @Incremental(name="avgAnalysis", tables="PhoneSalesTable",
> bufferTime="20")select brandName, count(DISTINCT orderID),
> incr_avg(quantity, “average_quantity”)  from PhoneSalesTable    where
> version= "1.0.0" group by brandName;*
>
> The following are the to-do items on this feature:
>
>    - Do a load test with all cases/scenarios with distributed setup
>    - Implement more commonly used hive functions as incremental
>    functions(incr_count, incr_sum, etc).
>    - Write a sample to explain how to use this feature.
>
>
> [1] http://docs.wso2.org/pages/viewpage.action?pageId=32345660
> [2]
> http://wso2-oxygen-tank.10903.n7.nabble.com/Incremental-Data-Processing-for-BAM-td77582.html
>
> Thanks,
> Sinthuja.
>
> --
> *Sinthuja Rajendran*
> Software Engineer <http://wso2.com/>
> WSO2, Inc.:http://wso2.com
>
> Blog: http://sinthu-rajan.blogspot.com/
> Mobile: +94774273955
>
>
>


-- 
============================
Srinath Perera, Ph.D.
  Director, Research, WSO2 Inc.
  Visiting Faculty, University of Moratuwa
  Member, Apache Software Foundation
  Research Scientist, Lanka Software Foundation
  Blog: http://srinathsview.blogspot.com/
  Photos: http://www.flickr.com/photos/hemapani/
   Phone: 0772360902

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [Update] Incremental Processing for BAM

Reply via email to