Yes it does work for Java pipelines, modulo
https://github.com/apache/beam/pull/4211 . I'm actually not sure what
the performance characteristics are; but I'm sure it's not as dramatic
as improvement (if any) compared to what we see in Python. It's great
for development though.

On Fri, Feb 16, 2018 at 12:06 PM, Marián Dvorský <mari...@google.com> wrote:
> Does the same runner work for Java pipelines? (I assume so, given that it
> uses portability framework.) If so, does it provide similar speedup?
>
> On Fri, Feb 16, 2018 at 7:37 PM Robert Bradshaw <rober...@google.com> wrote:
>>
>> If there are no concerns, I say let's merge this.
>>
>> On Fri, Feb 16, 2018 at 9:39 AM, Charles Chen <c...@google.com> wrote:
>> > I hope those interested have had time to test this out.  I have sent out
>> > https://github.com/apache/beam/pull/4696 to switch to using this fast
>> > runner
>> > as the default DirectRunner for local execution.  Let me know if there
>> > are
>> > any concerns.
>> >
>> > On Tue, Feb 13, 2018 at 12:17 PM Charles Chen <c...@google.com> wrote:
>> >>
>> >> This is now checked into master.  You can use it by setting
>> >> --runner=SwitchingDirectRunner.  Please let us know if you run into any
>> >> issues.
>> >>
>> >>
>> >> On Thu, Feb 8, 2018 at 10:30 AM Romain Manni-Bucau
>> >> <rmannibu...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Very interesting! Sounds like a sane way for beam future and I'm very
>> >>> happy it is consistent with the current Java experience: no need to
>> >>> interlace runners at the end, it makes design, code and user
>> >>> experience way
>> >>> better than trying to put everything in the direct runner :).
>> >>>
>> >>> Le 8 févr. 2018 19:20, "María García Herrero" <mari...@google.com> a
>> >>> écrit :
>> >>>>
>> >>>> Amazing improvement, Charles.
>> >>>> Thanks for the effort!
>> >>>>
>> >>>>
>> >>>> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov
>> >>>> <kirpic...@google.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Sounds awesome, congratulations and thanks for making this happen!
>> >>>>>
>> >>>>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <rang...@google.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> This is terrific news! Thanks Charles.
>> >>>>>>
>> >>>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <c...@google.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Local execution of Beam pipelines on the Python DirectRunner
>> >>>>>>> currently suffers from performance issues, which makes it hard for
>> >>>>>>> pipeline
>> >>>>>>> authors to iterate, especially on medium to large size datasets.
>> >>>>>>> We would
>> >>>>>>> like to optimize and make this a better experience for Beam users.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> The FnApiRunner was written as a way of leveraging the portability
>> >>>>>>> framework execution code path for local portability development.
>> >>>>>>> We've found
>> >>>>>>> it also provides great speedups in batch execution with no user
>> >>>>>>> changes
>> >>>>>>> required, so we propose to switch to use this runner by default in
>> >>>>>>> batch
>> >>>>>>> pipelines.  For example, WordCount on the Shakespeare dataset with
>> >>>>>>> a single
>> >>>>>>> CPU core now takes 50 seconds to run, compared to 12 minutes
>> >>>>>>> before; this is
>> >>>>>>> a 15x performance improvement that users can get for free, with no
>> >>>>>>> user
>> >>>>>>> pipeline changes.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> The JIRA for this change is here
>> >>>>>>> (https://issues.apache.org/jira/browse/BEAM-3644), and a candidate
>> >>>>>>> patch is
>> >>>>>>> available here (https://github.com/apache/beam/pull/4634). I have
>> >>>>>>> been
>> >>>>>>> working over the last month on making this an automatic drop-in
>> >>>>>>> replacement
>> >>>>>>> for the current DirectRunner when applicable.  Before it becomes
>> >>>>>>> the
>> >>>>>>> default, you can try this runner now by manually specifying
>> >>>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the
>> >>>>>>> runner.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Even with this change, local Python pipeline execution can only
>> >>>>>>> effectively use one core because of the Python GIL.  A natural
>> >>>>>>> next step to
>> >>>>>>> further improve performance will be to refactor the FnApiRunner to
>> >>>>>>> allow for
>> >>>>>>> multi-process execution.  This is being tracked here
>> >>>>>>> (https://issues.apache.org/jira/browse/BEAM-3645).
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>>
>> >>>>>>> Charles
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Impact is the effect that wouldn’t have happened if you hadn’t done
>> >>>> what
>> >>>> you did.
>> >>>>
>> >>>>
>> >

Reply via email to