[ 
https://issues.apache.org/jira/browse/BEAM-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060366#comment-17060366
 ] 

Sheriffo Ceesay commented on BEAM-8258:
---------------------------------------

[~iemejia], I am interested in this. I have done similar work for Apache Gora 
in last year's GSoC and you can see the final report at 
[https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora.
|https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora]

 

I am more of a Java developer, however I am sure doing this in Python would be 
feasible. 

> Implement Nexmark (benchmark suite) in Python and integrate it with Spark and 
> Flink runners
> -------------------------------------------------------------------------------------------
>
>                 Key: BEAM-8258
>                 URL: https://issues.apache.org/jira/browse/BEAM-8258
>             Project: Beam
>          Issue Type: Bug
>          Components: testing-nexmark
>            Reporter: Ismaël Mejía
>            Priority: Minor
>              Labels: gsoc, gsoc2020, mentor
>
> Apache Beam [1] is a unified and portable programming model for data 
> processing jobs (pipelines). The Beam model [2, 3, 4] has rich mechanisms to 
> process endless streams of events.
> Nexmark [5] is a benchmark for streaming jobs, basically a set of jobs 
> (queries) to test different use cases of the execution system. Beam 
> implemented Nexmark for Java [6, 7] and it has been succesfully used to 
> improve the features of multiple Beam runners and discover performance 
> regressions.
> Thanks to the work on portability [8] we can now run Beam pipelines on top of 
> open source systems like Apache Spark [9] and Apache Flink [10]. The goal of 
> this issue/project is to implement the Nexmark queries on Python and 
> configure them to run on our CI on top of open source systems like Apache 
> Spark and Apache Flink. The goal is that it helps the project to track and 
> improve the evolution of portable open source runners and our python 
> implementation as we do for Java.
> Because of the time constraints of GSoC we will adjust the goals (sub-tasks) 
> depending on progress.
> [1] https://beam.apache.org/
> [2] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
> [3] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
> [4] 
> https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf
> [5] 
> https://web.archive.org/web/20100620010601/http://datalab.cs.pdx.edu/niagaraST/NEXMark/
> [6] https://beam.apache.org/documentation/sdks/java/testing/nexmark/
> [7] https://github.com/apache/beam/tree/master/sdks/java/testing/nexmark
> [8] https://beam.apache.org/roadmap/portability/
> [9] https://spark.apache.org/
> [10] https://flink.apache.org/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to