[ 
https://issues.apache.org/jira/browse/BEAM-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013105#comment-17013105
 ] 

Hannah Jiang edited comment on BEAM-7861 at 1/10/20 6:22 PM:
-------------------------------------------------------------

We can use --direct_running_mode to switch between multi_threading and 
multi_processing.
 We can set direct_running_mode to one of ['in_memory',  'multi_threading', 
'multi_processing']. Default mode is in_memory.

*in_memory*: it is multi threading mode, worker and runners' communication 
happens in the memory (not through gRPC).
 *multi_threading*: it is multi threading mode, worker and runners communicate 
through gRPC.
 *multi_processing*: it is multi processing, worker and runners communicate 
through gRPC.

Here is how to set the direct_running_mode.
 *Option 1*: set it with PipelineOptions().
{code:java}
 pipeline_options = PipelineOptions(direct_num_workers=2, 
direct_running_mode='multi_threading')
 p = beam.Pipeline(
         runner=fn_api_runner.FnApiRunner(),
         options=pipeline_options)
{code}

*Option 2*: pass it with CLI.
{code:java}
 python xxx --direct_num_workers 2 - -direct_running_mode 'multi_threading'

known_args, pipeline_args = parser.parse_known_args(argv)
 p = beam.Pipeline(
         runner=fn_api_runner.FnApiRunner(),
         options=pipeline_options)
{code}


was (Author: hannahjiang):
We can use --direct_running_mode to switch between multi_threading and 
multi_processing.
 We can set direct_running_mode to one of ['in_memory',  'multi_threading', 
'multi_processing']. Default mode is in_memory.

*in_memory*: it is multi threading mode, worker and runners' communication 
happens in the memory (not through gRPC).
 *multi_threading*: it is multi threading mode, worker and runners communicate 
through gRPC.
 *multi_processing*: it is multi processing, worker and runners communicate 
through gRPC.

Here is how to set the direct_running_mode.
 *Option 1*: set it with pipeline options.
{code:java}
 pipeline_options = PipelineOptions(direct_num_workers=2, 
direct_running_mode='multi_threading')
 p = beam.Pipeline(
         runner=fn_api_runner.FnApiRunner(),
         options=pipeline_options)
{code}

*Option 2*: pass it with CLI.
{code:java}
 python xxx --direct_num_workers 2 - -direct_running_mode 'multi_threading'
 p = beam.Pipeline(runner=fn_api_runner.FnApiRunner())
{code}

> Make it easy to change between multi-process and multi-thread mode for Python 
> Direct runners
> --------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7861
>                 URL: https://issues.apache.org/jira/browse/BEAM-7861
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Hannah Jiang
>            Assignee: Hannah Jiang
>            Priority: Major
>             Fix For: 2.19.0
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> BEAM-3645 makes it possible to run a map task parallel.
> However, users need to change runner when switch between multithreading and 
> multiprocessing mode.
> We want to add a flag (ex: --use-multiprocess) to make the switch easy 
> without changing the runner each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to