[ 
https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879034#comment-15879034
 ] 

Pablo Estrada edited comment on BEAM-1442 at 2/22/17 9:43 PM:
--------------------------------------------------------------

Hi Haoxiang,
It's great that you find the project interesting. It is a challenging -and 
exciting- project. We want to have a detailed proposal, because as you may 
guess, the project is not easy and we want to help you (or any student) 
understand the DirectRunner well before you are selected.

With this in mind, we suggest you include the following items in the proposal:
(1) Introduction - Introduce the project
(2) Goals, 
(3) Implementation - of a benchmark and the runner improvements.  Be as 
specific and detailed as possible.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.

Feel free to ask questions, or share your train of thought here, and we can 
help you polish the proposal to make it robust - and help you familiarize 
yourself with the DirectRunner.


was (Author: pabloem):
Before we can accept your proposal we have to be confident that you understand 
the existing code and you can expand it safely.
For a proposal you should include:
(1) Introduction - Introduce the project
(2) Goals, 
(3) Implementation - of a benchmark and the runner improvements.  Be as 
specific and detailed as possible. This project is not easy and we need to see 
that you have a good grasp of the different components.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.

> Performance improvement of the Python DirectRunner
> --------------------------------------------------
>
>                 Key: BEAM-1442
>                 URL: https://issues.apache.org/jira/browse/BEAM-1442
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py
>            Reporter: Pablo Estrada
>            Assignee: Ahmet Altay
>              Labels: gsoc2017, mentor, python
>
> The DirectRunner for Python and Java are intended to act as policy enforcers, 
> and correctness checkers for Beam pipelines; but there are users that run 
> data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although 
> some work has gone into improving it. There are more opportunities for 
> improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline, 
> and study the `Pipeline.run` and `DirectRunner.run` methods. Ask questions 
> directly on JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to