Yes, the portability framework is designed to support this, and possibly even more efficient transfers of data than element-by-element as per the wire coder specified in the IO port operators. I left some comments on the doc as well, and would also prefer approach 2.
On Wed, Nov 6, 2019 at 11:03 AM Kenneth Knowles <k...@apache.org> wrote: > > I think the portability framework is designed for this. The runner controls > the coder on the grpc ports and the runner controls the process bundle > descriptor. > > I commented on the doc. I think what is missing is analysis of scope of SDK > harness changes and risk to model consistency > > Approach 2: probably no SDK harness work / compatible with existing Beam > model so no risk of introducing inconsistency > > Approach 1: what are all the details? > option a: if the SDK harness has to understand "values without > windows" then very large changes and high risk of introducing inconsistency > (I eliminated many of these inconsistencies) > option b: if the coder just puts default window/timestamp/pane info > on elements, then it is the same as approach 2, no work / no risk > > Kenn > > On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <sunjincheng...@gmail.com> wrote: >> >> Hi all, >> >> I am trying to make some improvements of portability framework to make it >> usable in other projects. However, we find that the coder between runner and >> harness can only be FullWindowedValueCoder. This means each time when >> sending a WindowedValue, we have to encode/decode timestamp, windows and pan >> infos. In some circumstances(such as using the portability framework in >> Flink), only values are needed between runner and harness. So, it would be >> nice if we can configure the coder and avoid redundant encoding and decoding >> between runner and harness to improve the performance. >> >> There are two approaches to solve this issue: >> >> Approach 1: Support ValueOnlyWindowedValueCoder between runner and >> harness. >> Approach 2: Add a "constant" window coder that embeds all the windowing >> information as part of the coder that should be used to wrap the value >> during decoding. >> >> More details can be found here [1]. >> >> As of the shortcomings of “Approach 2” which still need to encode/decode >> timestamp and pane infos, we tend to choose “Approach 1” which brings better >> performance and is more thorough. >> >> Welcome any feedback :) >> >> Best, >> Jincheng >> >> [1] >> https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing >>