[
https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879034#comment-15879034
]
Pablo Estrada commented on BEAM-1442:
-------------------------------------
For a proposal you should include
(1) Introduction - Introduce the project
(2) Goals,
(3) Implementation - of a benchmark and the runner improvements. Be as
specific and detailed as possible. This project is not easy and we need to see
that you have a good grasp of the different components.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.
> Performance improvement of the Python DirectRunner
> --------------------------------------------------
>
> Key: BEAM-1442
> URL: https://issues.apache.org/jira/browse/BEAM-1442
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py
> Reporter: Pablo Estrada
> Assignee: Ahmet Altay
> Labels: gsoc2017, mentor, python
>
> The DirectRunner for Python and Java are intended to act as policy enforcers,
> and correctness checkers for Beam pipelines; but there are users that run
> data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although
> some work has gone into improving it. There are more opportunities for
> improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline,
> and study the `Pipeline.run` and `DirectRunner.run` methods. Ask questions
> directly on JIRA.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)