[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

ASF GitHub Bot (JIRA) Thu, 06 Jun 2019 16:31:15 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=255540&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-255540
 ]


ASF GitHub Bot logged work on BEAM-3645:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Jun/19 23:30
            Start Date: 06/Jun/19 23:30
    Worklog Time Spent: 10m 
      Work Description: Hannah-Jiang commented on issue #8769: [WIP] 
[BEAM-3645] support multi processes for Python FnApiRunner with 
EmbeddedGrpcWorkerHandler
URL: https://github.com/apache/beam/pull/8769#issuecomment-499705111
 
 
   Here I add some explanation to make it easier to understand what I am doing.
   
   **Assumption/Precondition**:
   1. Since it is a local runner, Workers are not added or removed dynamically 
during a job and we have fixed number of workers. Instead of reading signals 
from workers to dynamically change worker list, we create a fixed list of 
workers. A worker is added to the list when it is started and its life status 
is not monitored. 
   
   2. Since it is a local runner, number of workers are not big (would be 
<=32). I used a dict to count current work load for each worker. 
   
   **Load balancing algorithm**:
   A <worker_id : task_count> map is used to record current work load for each 
worker. When a new task is added to a worker's queue, increase load by 1 and 
when receive response 
   of a task, decrease load by 1. A new process_bundle request will be assigned 
to a worker who has minimum work load. At the end, task_count of all workers 
should be 0.
   
   **Task multiplexing**:
   A GRPC server with 1 control server, 1 data server, 1 state server and 1 
logging server is used for communication between a runner and workers.  A 
control handler is multiplexing tasks to workers. A <worker_id, task_queue> is 
maintained to store tasks.  Runner identifies work_id from GRPC context and 
yield from the worker's task queue. A <instruction_id : worker_id> mapping is 
maintained
   for multiplexing data and other tasks (process_bundle_progress, 
process_bundle_split).
   
   **Data multiplexing**:
   A server side round robin data channel is implemented for multiplexing. Data 
is stored for each worker and sent to the worker when it asks for data.  It's a 
quite similar implementation as task multiplexing. No change with client side 
data channel, because we have one runner as it is.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 255540)
    Time Spent: 50m  (was: 40m)

> Support multi-process execution on the FnApiRunner
> --------------------------------------------------
>
>                 Key: BEAM-3645
>                 URL: https://issues.apache.org/jira/browse/BEAM-3645
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Charles Chen
>            Assignee: Hannah Jiang
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

Reply via email to