This is now checked into master.  You can use it by setting
--runner=SwitchingDirectRunner.  Please let us know if you run into any
issues.


On Thu, Feb 8, 2018 at 10:30 AM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> Very interesting! Sounds like a sane way for beam future and I'm very
> happy it is consistent with the current Java experience: no need to
> interlace runners at the end, it makes design, code and user experience way
> better than trying to put everything in the direct runner :).
>
> Le 8 févr. 2018 19:20, "María García Herrero" <mari...@google.com> a
> écrit :
>
>> Amazing improvement, Charles.
>> Thanks for the effort!
>>
>>
>> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <kirpic...@google.com>
>> wrote:
>>
>>> Sounds awesome, congratulations and thanks for making this happen!
>>>
>>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <rang...@google.com> wrote:
>>>
>>>> This is terrific news! Thanks Charles.
>>>>
>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <c...@google.com> wrote:
>>>>
>>>>> Local execution of Beam pipelines on the Python DirectRunner currently
>>>>> suffers from performance issues, which makes it hard for pipeline authors
>>>>> to iterate, especially on medium to large size datasets.  We would like to
>>>>> optimize and make this a better experience for Beam users.
>>>>>
>>>>> The FnApiRunner was written as a way of leveraging the portability
>>>>> framework execution code path for local portability development. We've
>>>>> found it also provides great speedups in batch execution with no user
>>>>> changes required, so we propose to switch to use this runner by default in
>>>>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>>>>> single CPU core now takes 50 seconds to run, compared to 12 minutes 
>>>>> before;
>>>>> this is a 15x performance improvement that users can get for free,
>>>>> with no user pipeline changes.
>>>>>
>>>>> The JIRA for this change is here (
>>>>> https://issues.apache.org/jira/browse/BEAM-3644), and a candidate
>>>>> patch is available here (https://github.com/apache/beam/pull/4634). I
>>>>> have been working over the last month on making this an automatic drop-in
>>>>> replacement for the current DirectRunner when applicable.  Before it
>>>>> becomes the default, you can try this runner now by manually specifying
>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the
>>>>> runner.
>>>>>
>>>>> Even with this change, local Python pipeline execution can only
>>>>> effectively use one core because of the Python GIL.  A natural next step 
>>>>> to
>>>>> further improve performance will be to refactor the FnApiRunner to allow
>>>>> for multi-process execution.  This is being tracked here (
>>>>> https://issues.apache.org/jira/browse/BEAM-3645).
>>>>>
>>>>> Best,
>>>>>
>>>>> Charles
>>>>>
>>>>
>>
>> --
>>
>> Impact is the effect that wouldn’t have happened if you hadn’t done what you
>> did.
>>
>>

Reply via email to