On Thu, Nov 7, 2019 at 9:19 AM Luke Cwik <lc...@google.com> wrote:

> I did suggest one other alternative on Jincheng's PR[1] which was to allow
> windowless values to be sent across the gRPC port. The SDK would then be
> responsible for ensuring that the execution didn't access any properties
> that required knowledge of the timestamp, pane or window. This is different
> then adding the ValueOnlyWindowedValueCoder as a model coder because it
> allows SDKs to pass around raw values to functions without any windowing
> overhead which could be useful for things like the side input window
> mapping or window merging functions we have.
>

When you say "pass around" what does it mean? If it is over the wire, there
is already no overhead to ValueOnlyWindowedValueCoder. So do you mean the
overhead of having the layer of boxing of WindowedValue? I would assume all
non-value components of the WindowedValue from ValueOnlyWindowedValueCoder
are pointers to a single shared immutable instance carried by the coder
instance.

I think raw values can already be passed to functions, no? The main thing
is that elements in a PCollection always have a window, timestamp, and
paneinfo. Not all values are elements. Is there a specific case you have in
mind? I would not expect WindowMappingFn or window merging fn to be passing
"elements" but just values of the appropriate type for the function.


> On Thu, Nov 7, 2019 at 8:48 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> I think there is some misunderstanding about what is meant by option
>> 2. What Kenn (I think) and I are proposing is not a WindowedValueCoder
>> whose window/timestamp/paneinfo coders are parameterized to be
>> constant coders, but a WindowedValueCoder whose
>> window/timestamp/paneinfo values are specified as constants in the
>> coder.
>>
>> Let's call this NewValueOnlyWindowedValueCoder, and is parameterized
>> by a window, timestamp, and pane info instance
>>
>> The existing ValueOnlyWindowedValueCoder is literally
>> NewValueOnlyWindowedValueCoder(GlobalWindow, MIN_TIMESTAMP,
>> PaneInfo.NO_FIRING). Note in particular that using the existing
>> ValueOnlyWindowedValueCoder would give the wrong timestamp and pane
>> info if it is use for the result of a GBK, which I think is the loss
>> of consistency referred to here.
>>
>
Yes, this is exactly what I am proposing and sounds like what "approach 2"
is. I think this approach "just works". It is simpler and more efficient
than "WindowedValueCoder.of(ValueCoder, WindowCoder, TimestampCoder,
PaneInfoCoder)" which would require some overhead even if WindowCoder,
TimestampCoder and PaneInfoCoder were constant coders consuming zero bytes.

Kenn



> On Thu, Nov 7, 2019 at 1:03 AM jincheng sun <sunjincheng...@gmail.com>
>> wrote:
>> >
>> > Thanks for your feedback and the valuable comments, Kenn & Robert!
>> >
>> > I think your comments are more comprehensive and enlighten me a lot.
>> The two proposals which I mentioned above are to reuse the existing coder
>> (FullWindowedValueCoder and ValueOnlyWindowedValueCoder). Now, with your
>> comments, I think we can further abstract 'FullWindowedValueCoder' and
>> 'ValueOnlyWindowedValueCoder', that is, we can rename
>> 'FullWindowedValueCoder' to 'WindowedValueCoder', and make the coders of
>> window/timestamp/pane configurable. Then we can remove
>> 'ValueOnlyWindowedValueCoder' and need to add a mount of constant coders
>> for window/timestamp/pane.
>> >
>> > I have replied your comments on the doc, and quick feedback as
>> following:
>> >
>> > Regarding to "Approach 2: probably no SDK harness work / compatible
>> with existing Beam model so no risk of introducing inconsistency",if we
>> "just puts default window/timestamp/pane info on elements" and don't change
>> the original coder, then the performance is not optimized. If we want to
>> get the best performance, then the default coder of Window/timestamp/pane
>> should be constant coder. In this case the SDK harnesses need to be aware
>> of the constant coder and there will be some development work in the SDK
>> harness. Besides, the SDK harness also needs to make the coders for
>> window/timestamp/pane configurable and this will introduce some related
>> changes, such as updating WindowedValueCoder._get_component_coders, etc.
>> >
>> > Regarding to "Approach 1: option a: if the SDK harness has to
>> understand 'values without windows' then very large changes and high risk
>> of introducing inconsistency (I eliminated many of these inconsistencies)",
>> we only need to add ValueOnlyWindowedValueCoder to the StandardCoders and
>> all the SDK harness should be aware of this coder. There is no much changes
>> actually.
>> >
>> > Please feel free to correct me if there is anyting incorrect. :)
>> >
>> > Besides, I'm not quite clear about the consistency issues you meant
>> here. Could you please give me some hints about this?
>> >
>> > Best,
>> > Jincheng
>> >
>> > Robert Bradshaw <rober...@google.com> 于2019年11月7日周四 上午3:38写道:
>> >>
>> >> Yes, the portability framework is designed to support this, and
>> >> possibly even more efficient transfers of data than element-by-element
>> >> as per the wire coder specified in the IO port operators. I left some
>> >> comments on the doc as well, and would also prefer approach 2.
>> >>
>> >> On Wed, Nov 6, 2019 at 11:03 AM Kenneth Knowles <k...@apache.org>
>> wrote:
>> >> >
>> >> > I think the portability framework is designed for this. The runner
>> controls the coder on the grpc ports and the runner controls the process
>> bundle descriptor.
>> >> >
>> >> > I commented on the doc. I think what is missing is analysis of scope
>> of SDK harness changes and risk to model consistency
>> >> >
>> >> >     Approach 2: probably no SDK harness work / compatible with
>> existing Beam model so no risk of introducing inconsistency
>> >> >
>> >> >     Approach 1: what are all the details?
>> >> >         option a: if the SDK harness has to understand "values
>> without windows" then very large changes and high risk of introducing
>> inconsistency (I eliminated many of these inconsistencies)
>> >> >         option b: if the coder just puts default
>> window/timestamp/pane info on elements, then it is the same as approach 2,
>> no work / no risk
>> >> >
>> >> > Kenn
>> >> >
>> >> > On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <
>> sunjincheng...@gmail.com> wrote:
>> >> >>
>> >> >> Hi all,
>> >> >>
>> >> >> I am trying to make some improvements of portability framework to
>> make it usable in other projects. However, we find that the coder between
>> runner and harness can only be FullWindowedValueCoder. This means each time
>> when sending a WindowedValue, we have to encode/decode timestamp, windows
>> and pan infos. In some circumstances(such as using the portability
>> framework in Flink), only values are needed between runner and harness. So,
>> it would be nice if we can configure the coder and avoid redundant encoding
>> and decoding between runner and harness to improve the performance.
>> >> >>
>> >> >> There are two approaches to solve this issue:
>> >> >>
>> >> >>     Approach 1:  Support ValueOnlyWindowedValueCoder between runner
>> and harness.
>> >> >>     Approach 2:  Add a "constant" window coder that embeds all the
>> windowing information as part of the coder that should be used to wrap the
>> value during decoding.
>> >> >>
>> >> >> More details can be found here [1].
>> >> >>
>> >> >> As of the shortcomings of “Approach 2” which still need to
>> encode/decode timestamp and pane infos, we tend to choose “Approach 1”
>> which brings better performance and is more thorough.
>> >> >>
>> >> >> Welcome any feedback :)
>> >> >>
>> >> >> Best,
>> >> >> Jincheng
>> >> >>
>> >> >> [1]
>> https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing
>> >> >>
>>
>

Reply via email to