Thanks a lot for the explanation Robert, it makes sense now.

On Tue, Jun 25, 2019 at 2:58 PM Robert Bradshaw <rober...@google.com> wrote:
>
> On Tue, Jun 25, 2019 at 2:26 PM Ismaël Mejía <ieme...@gmail.com> wrote:
> >
> > I stumbled recently into BEAM-3644 [1]. This issue mentions that
> > Python direct runner saw a great performance gain because of relying
> > on portability’s FnApiRunner. This seems to me a bit contra-intuitive
> > considering the extra overhead of portability. How is this possible or
> > what is the explanation for this? My assumption is that we are talking
> > for this case about process embedding without networking services for
> > Fn API, is it the case? I am surprised it is better given the extra
> > layers and curious if there are improvements in latency too, so this
> > could benefit interactive uses like Jupyter notebooks. Any info or
> > details on this?
>
> The FnApiRunner, when used in direct mode, communicates with an
> in-process "worker" via the portability protos, but does not do so
> over an GRPC connection (and the data channel is a simple, inline one
> that avoids the protos altogether), so overall the overhead is minimal
> (except the serialization of elements over the data channel, though
> that only happens at fusion breaks). It may be possible to improve on
> this a bit (e.g. one idea is encoding references only rather than full
> values over the data channel, as everything is in memory, though this
> could have negative impact on peak memory usage).
>
> The primary performance win is that the FnApiRunner is based on the
> same code we use in the Worker, which has been much more optimized and
> has a more advanced execution flow (e.g. uses fusion).
>
> > I also saw BEAM-3645 [2] to support multi-process execution on python
> > direct runner which could eventually bring even better performance.
> > Will this work in a 1 to 1 mapping between client and service or
> > somehow the FnApiRunner would deal with concurrency? I suppose this
> > change will benefit the performance of Python for portability too.
> > Just curious to understand if it is the case and how it works.
>
> Here the change is that multiple workers would be instantiated talking
> to the single FnApiRunner control service(s), though likely the
> workers would be other processes (to get around the GIL) in which case
> GRPC would be used. Work would be sharded among the workers (e.g. a
> single PCollection split into N pieces and processed concurrently). I
> don't think this would impact other portable runners, as they're
> already free to start up as many SDK harnesses, and have as many
> workers within each SDK harness, as they want already, but it should
> be a big win for running on a single multi-core machine.
>
> > [1] https://issues.apache.org/jira/browse/BEAM-3644
> > [2] https://issues.apache.org/jira/browse/BEAM-3645

Reply via email to