[ 
https://issues.apache.org/jira/browse/BEAM-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Chen resolved BEAM-3644.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

> Speed up Python DirectRunner execution by using the FnApiRunner when possible
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-3644
>                 URL: https://issues.apache.org/jira/browse/BEAM-3644
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Charles Chen
>            Assignee: Charles Chen
>            Priority: Major
>             Fix For: 2.4.0
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Local execution of Beam pipelines on the current Python DirectRunner 
> currently suffers from performance issues, which makes it hard for pipeline 
> authors to iterate, especially on medium to large size datasets. We would 
> like to optimize and make this a better experience for Beam users.
> The FnApiRunner was written as a way of leveraging the portability framework 
> execution code path for local execution for portability development. We've 
> found it also offers great speedups in batch execution, so we propose to 
> switch to use this runner in batch pipelines. For example, WordCount on the 
> Shakespeare dataset with a single CPU core now takes 50 seconds to run, 
> compared to 12 minutes before, a 15x performance improvement that users can 
> get for free, with no pipeline changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to