[
https://issues.apache.org/jira/browse/BEAM-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Chen resolved BEAM-3644.
--------------------------------
Resolution: Fixed
Fix Version/s: 2.4.0
> Speed up Python DirectRunner execution by using the FnApiRunner when possible
> -----------------------------------------------------------------------------
>
> Key: BEAM-3644
> URL: https://issues.apache.org/jira/browse/BEAM-3644
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Affects Versions: 2.2.0, 2.3.0
> Reporter: Charles Chen
> Assignee: Charles Chen
> Priority: Major
> Fix For: 2.4.0
>
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> Local execution of Beam pipelines on the current Python DirectRunner
> currently suffers from performance issues, which makes it hard for pipeline
> authors to iterate, especially on medium to large size datasets. We would
> like to optimize and make this a better experience for Beam users.
> The FnApiRunner was written as a way of leveraging the portability framework
> execution code path for local execution for portability development. We've
> found it also offers great speedups in batch execution, so we propose to
> switch to use this runner in batch pipelines. For example, WordCount on the
> Shakespeare dataset with a single CPU core now takes 50 seconds to run,
> compared to 12 minutes before, a 15x performance improvement that users can
> get for free, with no pipeline changes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)