[ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=261868&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261868
 ]

ASF GitHub Bot logged work on BEAM-3645:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Jun/19 22:28
            Start Date: 17/Jun/19 22:28
    Worklog Time Spent: 10m 
      Work Description: Hannah-Jiang commented on pull request #8769: [WIP] 
[BEAM-3645] support multi processes for Python FnApiRunner with 
EmbeddedGrpcWorkerHandler
URL: https://github.com/apache/beam/pull/8769#discussion_r294543591
 
 

 ##########
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##########
 @@ -1418,6 +1449,51 @@ def process_bundle(self, inputs, expected_outputs, 
parallel_uid_counter=None):
 
     return result, split_results
 
+class ParallelBundleManager(BundleManager):
+  _uid_counter = 0
+  def process_bundle(self, inputs, expected_outputs):
+    input_value = list(inputs.values())[0]
+    if isinstance(input_value, list):
 
 Review comment:
   I changed the iterator to return a list at a new PR, which is same as before.
   Previously, we returned an iterator with one element, now it returns an 
iterator with N elements.
   We can apply the same interface when we read either from list type or 
GroupingBuffer type and write to output stream.
   ```
   for i, byte_stream in enumerate(byte_streams):
       if idx is None or i == idx:
           data_out.write(byte_stream)
           if idx is not None:
             break
   ```
   If we introduce `partition()` function to GroupingBuffer, don't we need to 
check the type every time when we write to output_stream because list type 
doesn't have this function?
   I don't have a clear idea to avoid type check here at the moment though.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 261868)
    Time Spent: 4h 20m  (was: 4h 10m)

> Support multi-process execution on the FnApiRunner
> --------------------------------------------------
>
>                 Key: BEAM-3645
>                 URL: https://issues.apache.org/jira/browse/BEAM-3645
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Charles Chen
>            Assignee: Hannah Jiang
>            Priority: Major
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to