I think there is some misunderstanding about what is meant by option 2. What Kenn (I think) and I are proposing is not a WindowedValueCoder whose window/timestamp/paneinfo coders are parameterized to be constant coders, but a WindowedValueCoder whose window/timestamp/paneinfo values are specified as constants in the coder.
Let's call this NewValueOnlyWindowedValueCoder, and is parameterized by a window, timestamp, and pane info instance The existing ValueOnlyWindowedValueCoder is literally NewValueOnlyWindowedValueCoder(GlobalWindow, MIN_TIMESTAMP, PaneInfo.NO_FIRING). Note in particular that using the existing ValueOnlyWindowedValueCoder would give the wrong timestamp and pane info if it is use for the result of a GBK, which I think is the loss of consistency referred to here. On Thu, Nov 7, 2019 at 1:03 AM jincheng sun <sunjincheng...@gmail.com> wrote: > > Thanks for your feedback and the valuable comments, Kenn & Robert! > > I think your comments are more comprehensive and enlighten me a lot. The two > proposals which I mentioned above are to reuse the existing coder > (FullWindowedValueCoder and ValueOnlyWindowedValueCoder). Now, with your > comments, I think we can further abstract 'FullWindowedValueCoder' and > 'ValueOnlyWindowedValueCoder', that is, we can rename > 'FullWindowedValueCoder' to 'WindowedValueCoder', and make the coders of > window/timestamp/pane configurable. Then we can remove > 'ValueOnlyWindowedValueCoder' and need to add a mount of constant coders for > window/timestamp/pane. > > I have replied your comments on the doc, and quick feedback as following: > > Regarding to "Approach 2: probably no SDK harness work / compatible with > existing Beam model so no risk of introducing inconsistency",if we "just puts > default window/timestamp/pane info on elements" and don't change the original > coder, then the performance is not optimized. If we want to get the best > performance, then the default coder of Window/timestamp/pane should be > constant coder. In this case the SDK harnesses need to be aware of the > constant coder and there will be some development work in the SDK harness. > Besides, the SDK harness also needs to make the coders for > window/timestamp/pane configurable and this will introduce some related > changes, such as updating WindowedValueCoder._get_component_coders, etc. > > Regarding to "Approach 1: option a: if the SDK harness has to understand > 'values without windows' then very large changes and high risk of introducing > inconsistency (I eliminated many of these inconsistencies)", we only need to > add ValueOnlyWindowedValueCoder to the StandardCoders and all the SDK harness > should be aware of this coder. There is no much changes actually. > > Please feel free to correct me if there is anyting incorrect. :) > > Besides, I'm not quite clear about the consistency issues you meant here. > Could you please give me some hints about this? > > Best, > Jincheng > > Robert Bradshaw <rober...@google.com> 于2019年11月7日周四 上午3:38写道: >> >> Yes, the portability framework is designed to support this, and >> possibly even more efficient transfers of data than element-by-element >> as per the wire coder specified in the IO port operators. I left some >> comments on the doc as well, and would also prefer approach 2. >> >> On Wed, Nov 6, 2019 at 11:03 AM Kenneth Knowles <k...@apache.org> wrote: >> > >> > I think the portability framework is designed for this. The runner >> > controls the coder on the grpc ports and the runner controls the process >> > bundle descriptor. >> > >> > I commented on the doc. I think what is missing is analysis of scope of >> > SDK harness changes and risk to model consistency >> > >> > Approach 2: probably no SDK harness work / compatible with existing >> > Beam model so no risk of introducing inconsistency >> > >> > Approach 1: what are all the details? >> > option a: if the SDK harness has to understand "values without >> > windows" then very large changes and high risk of introducing >> > inconsistency (I eliminated many of these inconsistencies) >> > option b: if the coder just puts default window/timestamp/pane >> > info on elements, then it is the same as approach 2, no work / no risk >> > >> > Kenn >> > >> > On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <sunjincheng...@gmail.com> >> > wrote: >> >> >> >> Hi all, >> >> >> >> I am trying to make some improvements of portability framework to make it >> >> usable in other projects. However, we find that the coder between runner >> >> and harness can only be FullWindowedValueCoder. This means each time when >> >> sending a WindowedValue, we have to encode/decode timestamp, windows and >> >> pan infos. In some circumstances(such as using the portability framework >> >> in Flink), only values are needed between runner and harness. So, it >> >> would be nice if we can configure the coder and avoid redundant encoding >> >> and decoding between runner and harness to improve the performance. >> >> >> >> There are two approaches to solve this issue: >> >> >> >> Approach 1: Support ValueOnlyWindowedValueCoder between runner and >> >> harness. >> >> Approach 2: Add a "constant" window coder that embeds all the >> >> windowing information as part of the coder that should be used to wrap >> >> the value during decoding. >> >> >> >> More details can be found here [1]. >> >> >> >> As of the shortcomings of “Approach 2” which still need to encode/decode >> >> timestamp and pane infos, we tend to choose “Approach 1” which brings >> >> better performance and is more thorough. >> >> >> >> Welcome any feedback :) >> >> >> >> Best, >> >> Jincheng >> >> >> >> [1] >> >> https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing >> >>