Local execution of Beam pipelines on the Python DirectRunner currently suffers from performance issues, which makes it hard for pipeline authors to iterate, especially on medium to large size datasets. We would like to optimize and make this a better experience for Beam users.
The FnApiRunner was written as a way of leveraging the portability framework execution code path for local portability development. We've found it also provides great speedups in batch execution with no user changes required, so we propose to switch to use this runner by default in batch pipelines. For example, WordCount on the Shakespeare dataset with a single CPU core now takes 50 seconds to run, compared to 12 minutes before; this is a 15x performance improvement that users can get for free, with no user pipeline changes. The JIRA for this change is here ( https://issues.apache.org/jira/browse/BEAM-3644), and a candidate patch is available here (https://github.com/apache/beam/pull/4634). I have been working over the last month on making this an automatic drop-in replacement for the current DirectRunner when applicable. Before it becomes the default, you can try this runner now by manually specifying apache_beam.runners.portability.fn_api_runner.FnApiRunner as the runner. Even with this change, local Python pipeline execution can only effectively use one core because of the Python GIL. A natural next step to further improve performance will be to refactor the FnApiRunner to allow for multi-process execution. This is being tracked here ( https://issues.apache.org/jira/browse/BEAM-3645). Best, Charles