[
https://issues.apache.org/jira/browse/FLINK-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198087#comment-15198087
]
Shannon Quinn commented on FLINK-3626:
--------------------------------------
It looks like the first step (thanks to Chesney Schepler for the guidance) is
to provide a handle to the Python environment for obtaining a subtask ID.
1: Modify org.apache.flink.python.api.streaming.data.PythonStreamer.java's
startPython() to send the output of getIndexOfThisSubtask() through the process
output stream.
2: Modify flink-libraries / flink-python / src / main / python / org / apache
/ flink / python / api / flink / plan / Environment.py's execute() method to
receive the new input and pass it through to the operator's _configure() method.
3: Modify Function.py's _configure() method to pass the subtask ID to the
runtime context constructor.
4: Modify RuntimeContext.py to include a simple getIndexOfThisSubtask()
method that returns the ID.
Once these steps are complete, I can start on the zipWithIndex job itself.
> zipWithIndex in Python API
> --------------------------
>
> Key: FLINK-3626
> URL: https://issues.apache.org/jira/browse/FLINK-3626
> Project: Flink
> Issue Type: New Feature
> Components: Python API
> Affects Versions: 1.0.0
> Environment: OS X 10.11.3, 16GB RAM, 500GB SSD, Core i7 2.5GHz.
> Reporter: Shannon Quinn
> Priority: Minor
>
> Implementation of a `zipWithIndex` method for the Python API on Flink. This
> will affix each record with a sequential integer ID, consistent across the
> distributed data structure.
> Work here: https://github.com/magsol/flink
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)