[ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=282581&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-282581
 ]

ASF GitHub Bot logged work on BEAM-3645:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Jul/19 10:06
            Start Date: 25/Jul/19 10:06
    Worklog Time Spent: 10m 
      Work Description: robertwb commented on pull request #8979: [BEAM-3645] 
add multiplexing for python FnApiRunner
URL: https://github.com/apache/beam/pull/8979#discussion_r307196577
 
 

 ##########
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##########
 @@ -1142,6 +1207,64 @@ def __init__(self, state, provision_info):
     self.data_server.start()
     self.control_server.start()
 
+  @classmethod
+  def get_control_conn_handler(cls):
+    return GrpcServer._cached_grpc_server.control_conn_handler
+
+  @classmethod
+  def get_data_conn_handler(cls):
+    return GrpcServer._cached_grpc_server.data_conn_handler
+
+  @classmethod
+  def get_instance(cls, state, provision_info):
 
 Review comment:
   Here we have a problem that wasn't as evident as before: state and 
provision_info are not global constants. (They are the same for a single 
pipeline, but not across all pipelines in a process.)
   
   There are two possible solutions: 
   
   (1) Rather than having the GRPC server be global, let it be local to the 
pipeline being run. This means it'd probably be a member of the 
WorkerHandlerManager, and it'd be an environment to WorkerHandler.create. The 
tricky bit is that we don't need it (and want to avoid the overhead of starting 
it up) for the embedded environment. It's a bit ugly, but as this is a special 
case we could check for that directly before calling create. (Alternatively, we 
could add a boolean of whether a GRPC server is required into 
WorkerHandler.register_environment and pass a lazy getter to create, but that 
might be overkill.)
   
   (2) Keep it as a global, but add the ability to multiples on state and 
provision info per worker id. 
   
   Of the two options, I think (1) is the cleanest. 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 282581)
    Time Spent: 30h  (was: 29h 50m)

> Support multi-process execution on the FnApiRunner
> --------------------------------------------------
>
>                 Key: BEAM-3645
>                 URL: https://issues.apache.org/jira/browse/BEAM-3645
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Charles Chen
>            Assignee: Hannah Jiang
>            Priority: Major
>             Fix For: 2.15.0
>
>          Time Spent: 30h
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to