[
https://issues.apache.org/jira/browse/BEAM-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Chen updated BEAM-3644:
-------------------------------
Summary: Speed up Python DirectRunner execution by using the FnApiRunner
when possible (was: Speeding up Python DirectRunner execution by using the
FnApiRunner when possible)
> Speed up Python DirectRunner execution by using the FnApiRunner when possible
> -----------------------------------------------------------------------------
>
> Key: BEAM-3644
> URL: https://issues.apache.org/jira/browse/BEAM-3644
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Affects Versions: 2.2.0, 2.3.0
> Reporter: Charles Chen
> Assignee: Charles Chen
> Priority: Major
>
> Local execution of Beam pipelines on the current Python DirectRunner
> currently suffers from performance issues, which makes it hard for pipeline
> authors to iterate, especially on medium to large size datasets. We would
> like to optimize and make this a better experience for Beam users.
> In the past few months, Robert implemented the FnApiRunner as a way of
> leveraging the portability framework execution code path for local execution.
> We've found great speedups in batch execution, so we propose to switch to use
> this runner in batch pipelines. For example, WordCount on the Shakespeare
> dataset with a single CPU core now takes 50 seconds to run, compared to 12
> minutes before, a 15x performance improvement that users can get for free,
> with no pipeline changes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)