[ 
https://issues.apache.org/jira/browse/FLINK-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198087#comment-15198087
 ] 

Shannon Quinn commented on FLINK-3626:
--------------------------------------

It looks like the first step (thanks to Chesney Schepler for the guidance) is 
to provide a handle to the Python environment for obtaining a subtask ID.

  1: Modify org.apache.flink.python.api.streaming.data.PythonStreamer.java's 
startPython() to send the output of getIndexOfThisSubtask() through the process 
output stream.

  2: Modify flink-libraries / flink-python / src / main / python / org / apache 
/ flink / python / api / flink / plan / Environment.py's execute() method to 
receive the new input and pass it through to the operator's _configure() method.

  3: Modify Function.py's _configure() method to pass the subtask ID to the 
runtime context constructor.

  4: Modify RuntimeContext.py to include a simple getIndexOfThisSubtask() 
method that returns the ID.

Once these steps are complete, I can start on the zipWithIndex job itself.

> zipWithIndex in Python API
> --------------------------
>
>                 Key: FLINK-3626
>                 URL: https://issues.apache.org/jira/browse/FLINK-3626
>             Project: Flink
>          Issue Type: New Feature
>          Components: Python API
>    Affects Versions: 1.0.0
>         Environment: OS X 10.11.3, 16GB RAM, 500GB SSD, Core i7 2.5GHz.
>            Reporter: Shannon Quinn
>            Priority: Minor
>
> Implementation of a `zipWithIndex` method for the Python API on Flink. This 
> will affix each record with a sequential integer ID, consistent across the 
> distributed data structure.
> Work here: https://github.com/magsol/flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to