I stumbled recently into BEAM-3644 [1]. This issue mentions that Python direct runner saw a great performance gain because of relying on portability’s FnApiRunner. This seems to me a bit contra-intuitive considering the extra overhead of portability. How is this possible or what is the explanation for this? My assumption is that we are talking for this case about process embedding without networking services for Fn API, is it the case? I am surprised it is better given the extra layers and curious if there are improvements in latency too, so this could benefit interactive uses like Jupyter notebooks. Any info or details on this?
I also saw BEAM-3645 [2] to support multi-process execution on python direct runner which could eventually bring even better performance. Will this work in a 1 to 1 mapping between client and service or somehow the FnApiRunner would deal with concurrency? I suppose this change will benefit the performance of Python for portability too. Just curious to understand if it is the case and how it works. [1] https://issues.apache.org/jira/browse/BEAM-3644 [2] https://issues.apache.org/jira/browse/BEAM-3645