This is now checked into master. You can use it by setting --runner=SwitchingDirectRunner. Please let us know if you run into any issues.
On Thu, Feb 8, 2018 at 10:30 AM Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > Very interesting! Sounds like a sane way for beam future and I'm very > happy it is consistent with the current Java experience: no need to > interlace runners at the end, it makes design, code and user experience way > better than trying to put everything in the direct runner :). > > Le 8 févr. 2018 19:20, "María García Herrero" <mari...@google.com> a > écrit : > >> Amazing improvement, Charles. >> Thanks for the effort! >> >> >> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>> Sounds awesome, congratulations and thanks for making this happen! >>> >>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <rang...@google.com> wrote: >>> >>>> This is terrific news! Thanks Charles. >>>> >>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <c...@google.com> wrote: >>>> >>>>> Local execution of Beam pipelines on the Python DirectRunner currently >>>>> suffers from performance issues, which makes it hard for pipeline authors >>>>> to iterate, especially on medium to large size datasets. We would like to >>>>> optimize and make this a better experience for Beam users. >>>>> >>>>> The FnApiRunner was written as a way of leveraging the portability >>>>> framework execution code path for local portability development. We've >>>>> found it also provides great speedups in batch execution with no user >>>>> changes required, so we propose to switch to use this runner by default in >>>>> batch pipelines. For example, WordCount on the Shakespeare dataset with a >>>>> single CPU core now takes 50 seconds to run, compared to 12 minutes >>>>> before; >>>>> this is a 15x performance improvement that users can get for free, >>>>> with no user pipeline changes. >>>>> >>>>> The JIRA for this change is here ( >>>>> https://issues.apache.org/jira/browse/BEAM-3644), and a candidate >>>>> patch is available here (https://github.com/apache/beam/pull/4634). I >>>>> have been working over the last month on making this an automatic drop-in >>>>> replacement for the current DirectRunner when applicable. Before it >>>>> becomes the default, you can try this runner now by manually specifying >>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the >>>>> runner. >>>>> >>>>> Even with this change, local Python pipeline execution can only >>>>> effectively use one core because of the Python GIL. A natural next step >>>>> to >>>>> further improve performance will be to refactor the FnApiRunner to allow >>>>> for multi-process execution. This is being tracked here ( >>>>> https://issues.apache.org/jira/browse/BEAM-3645). >>>>> >>>>> Best, >>>>> >>>>> Charles >>>>> >>>> >> >> -- >> >> Impact is the effect that wouldn’t have happened if you hadn’t done what you >> did. >> >>