[jira] [Commented] (HAMA-983) Hama runner for DataFlow

JongYoon Lim (JIRA) Sun, 04 Sep 2016 15:10:45 -0700

    [ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15463593#comment-15463593
 ]


JongYoon Lim commented on HAMA-983:
-----------------------------------

FlinkPipelineRunner internally has a translator for both pipeline and 
transform. It seems that translator translates Beam operators to their 
counterparts of flink and saves regarding information in TranslationContext 
which is used for flink job processing. I think this patch can be started from 
implementing a simple translator for batch job first.


> Hama runner for DataFlow
> ------------------------
>
>                 Key: HAMA-983
>                 URL: https://issues.apache.org/jira/browse/HAMA-983
>             Project: Hama
>          Issue Type: Bug
>            Reporter: Edward J. Yoon
>              Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HAMA-983) Hama runner for DataFlow

Reply via email to