[jira] [Commented] (HAMA-983) Hama runner for DataFlow

JongYoon Lim (JIRA) Sun, 18 Sep 2016 06:00:33 -0700

    [ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15500951#comment-15500951
 ]


JongYoon Lim commented on HAMA-983:
-----------------------------------

Hi, it takes some time to understand Beam API, spark and flink runner for Beam. 
And it seems that Beam's transforms can be translated to Hama's API as follow. 
And BSP for dataflow could be similar to SuperstepBSP. (if I have 
misunderstandings, please correct me)  
BEAM -> HAMA
ParDo -> Superstep
Read.Bound -> RecordReader
Writt.Bound -> RecordWriter
Combine -> Combiner
GroupByKey -> ? 

I'm about to start from batch mode first until Hama's streaming is ready. And 
I'll add sub-tasks for this soon. 


> Hama runner for DataFlow
> ------------------------
>
>                 Key: HAMA-983
>                 URL: https://issues.apache.org/jira/browse/HAMA-983
>             Project: Hama
>          Issue Type: Bug
>            Reporter: Edward J. Yoon
>              Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HAMA-983) Hama runner for DataFlow

Reply via email to