I think there is some misunderstanding about what is meant by option
2. What Kenn (I think) and I are proposing is not a WindowedValueCoder
whose window/timestamp/paneinfo coders are parameterized to be
constant coders, but a WindowedValueCoder whose
window/timestamp/paneinfo values are specified as constants in the
coder.

Let's call this NewValueOnlyWindowedValueCoder, and is parameterized
by a window, timestamp, and pane info instance

The existing ValueOnlyWindowedValueCoder is literally
NewValueOnlyWindowedValueCoder(GlobalWindow, MIN_TIMESTAMP,
PaneInfo.NO_FIRING). Note in particular that using the existing
ValueOnlyWindowedValueCoder would give the wrong timestamp and pane
info if it is use for the result of a GBK, which I think is the loss
of consistency referred to here.

On Thu, Nov 7, 2019 at 1:03 AM jincheng sun <sunjincheng...@gmail.com> wrote:
>
> Thanks for your feedback and the valuable comments, Kenn & Robert!
>
> I think your comments are more comprehensive and enlighten me a lot. The two 
> proposals which I mentioned above are to reuse the existing coder 
> (FullWindowedValueCoder and ValueOnlyWindowedValueCoder). Now, with your 
> comments, I think we can further abstract 'FullWindowedValueCoder' and 
> 'ValueOnlyWindowedValueCoder', that is, we can rename 
> 'FullWindowedValueCoder' to 'WindowedValueCoder', and make the coders of 
> window/timestamp/pane configurable. Then we can remove 
> 'ValueOnlyWindowedValueCoder' and need to add a mount of constant coders for 
> window/timestamp/pane.
>
> I have replied your comments on the doc, and quick feedback as following:
>
> Regarding to "Approach 2: probably no SDK harness work / compatible with 
> existing Beam model so no risk of introducing inconsistency",if we "just puts 
> default window/timestamp/pane info on elements" and don't change the original 
> coder, then the performance is not optimized. If we want to get the best 
> performance, then the default coder of Window/timestamp/pane should be 
> constant coder. In this case the SDK harnesses need to be aware of the 
> constant coder and there will be some development work in the SDK harness. 
> Besides, the SDK harness also needs to make the coders for 
> window/timestamp/pane configurable and this will introduce some related 
> changes, such as updating WindowedValueCoder._get_component_coders, etc.
>
> Regarding to "Approach 1: option a: if the SDK harness has to understand 
> 'values without windows' then very large changes and high risk of introducing 
> inconsistency (I eliminated many of these inconsistencies)", we only need to 
> add ValueOnlyWindowedValueCoder to the StandardCoders and all the SDK harness 
> should be aware of this coder. There is no much changes actually.
>
> Please feel free to correct me if there is anyting incorrect. :)
>
> Besides, I'm not quite clear about the consistency issues you meant here. 
> Could you please give me some hints about this?
>
> Best,
> Jincheng
>
> Robert Bradshaw <rober...@google.com> 于2019年11月7日周四 上午3:38写道:
>>
>> Yes, the portability framework is designed to support this, and
>> possibly even more efficient transfers of data than element-by-element
>> as per the wire coder specified in the IO port operators. I left some
>> comments on the doc as well, and would also prefer approach 2.
>>
>> On Wed, Nov 6, 2019 at 11:03 AM Kenneth Knowles <k...@apache.org> wrote:
>> >
>> > I think the portability framework is designed for this. The runner 
>> > controls the coder on the grpc ports and the runner controls the process 
>> > bundle descriptor.
>> >
>> > I commented on the doc. I think what is missing is analysis of scope of 
>> > SDK harness changes and risk to model consistency
>> >
>> >     Approach 2: probably no SDK harness work / compatible with existing 
>> > Beam model so no risk of introducing inconsistency
>> >
>> >     Approach 1: what are all the details?
>> >         option a: if the SDK harness has to understand "values without 
>> > windows" then very large changes and high risk of introducing 
>> > inconsistency (I eliminated many of these inconsistencies)
>> >         option b: if the coder just puts default window/timestamp/pane 
>> > info on elements, then it is the same as approach 2, no work / no risk
>> >
>> > Kenn
>> >
>> > On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <sunjincheng...@gmail.com> 
>> > wrote:
>> >>
>> >> Hi all,
>> >>
>> >> I am trying to make some improvements of portability framework to make it 
>> >> usable in other projects. However, we find that the coder between runner 
>> >> and harness can only be FullWindowedValueCoder. This means each time when 
>> >> sending a WindowedValue, we have to encode/decode timestamp, windows and 
>> >> pan infos. In some circumstances(such as using the portability framework 
>> >> in Flink), only values are needed between runner and harness. So, it 
>> >> would be nice if we can configure the coder and avoid redundant encoding 
>> >> and decoding between runner and harness to improve the performance.
>> >>
>> >> There are two approaches to solve this issue:
>> >>
>> >>     Approach 1:  Support ValueOnlyWindowedValueCoder between runner and 
>> >> harness.
>> >>     Approach 2:  Add a "constant" window coder that embeds all the 
>> >> windowing information as part of the coder that should be used to wrap 
>> >> the value during decoding.
>> >>
>> >> More details can be found here [1].
>> >>
>> >> As of the shortcomings of “Approach 2” which still need to encode/decode 
>> >> timestamp and pane infos, we tend to choose “Approach 1” which brings 
>> >> better performance and is more thorough.
>> >>
>> >> Welcome any feedback :)
>> >>
>> >> Best,
>> >> Jincheng
>> >>
>> >> [1] 
>> >> https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing
>> >>

Reply via email to