[
https://issues.apache.org/jira/browse/BEAM-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060366#comment-17060366
]
Sheriffo Ceesay commented on BEAM-8258:
---------------------------------------
[~iemejia], I am interested in this. I have done similar work for Apache Gora
in last year's GSoC and you can see the final report at
[https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora.
|https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora]
I am more of a Java developer, however I am sure doing this in Python would be
feasible.
> Implement Nexmark (benchmark suite) in Python and integrate it with Spark and
> Flink runners
> -------------------------------------------------------------------------------------------
>
> Key: BEAM-8258
> URL: https://issues.apache.org/jira/browse/BEAM-8258
> Project: Beam
> Issue Type: Bug
> Components: testing-nexmark
> Reporter: Ismaël Mejía
> Priority: Minor
> Labels: gsoc, gsoc2020, mentor
>
> Apache Beam [1] is a unified and portable programming model for data
> processing jobs (pipelines). The Beam model [2, 3, 4] has rich mechanisms to
> process endless streams of events.
> Nexmark [5] is a benchmark for streaming jobs, basically a set of jobs
> (queries) to test different use cases of the execution system. Beam
> implemented Nexmark for Java [6, 7] and it has been succesfully used to
> improve the features of multiple Beam runners and discover performance
> regressions.
> Thanks to the work on portability [8] we can now run Beam pipelines on top of
> open source systems like Apache Spark [9] and Apache Flink [10]. The goal of
> this issue/project is to implement the Nexmark queries on Python and
> configure them to run on our CI on top of open source systems like Apache
> Spark and Apache Flink. The goal is that it helps the project to track and
> improve the evolution of portable open source runners and our python
> implementation as we do for Java.
> Because of the time constraints of GSoC we will adjust the goals (sub-tasks)
> depending on progress.
> [1] https://beam.apache.org/
> [2] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
> [3] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
> [4]
> https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf
> [5]
> https://web.archive.org/web/20100620010601/http://datalab.cs.pdx.edu/niagaraST/NEXMark/
> [6] https://beam.apache.org/documentation/sdks/java/testing/nexmark/
> [7] https://github.com/apache/beam/tree/master/sdks/java/testing/nexmark
> [8] https://beam.apache.org/roadmap/portability/
> [9] https://spark.apache.org/
> [10] https://flink.apache.org/
--
This message was sent by Atlassian Jira
(v8.3.4#803005)