I did suggest one other alternative on Jincheng's PR[1] which was to allow
windowless values to be sent across the gRPC port. The SDK would then be
responsible for ensuring that the execution didn't access any properties
that required knowledge of the timestamp, pane or window. This is different
then adding the ValueOnlyWindowedValueCoder as a model coder because it
allows SDKs to pass around raw values to functions without any windowing
overhead which could be useful for things like the side input window
mapping or window merging functions we have.

1: https://github.com/apache/beam/pull/9979

On Thu, Nov 7, 2019 at 8:48 AM Robert Bradshaw <rober...@google.com> wrote:

> I think there is some misunderstanding about what is meant by option
> 2. What Kenn (I think) and I are proposing is not a WindowedValueCoder
> whose window/timestamp/paneinfo coders are parameterized to be
> constant coders, but a WindowedValueCoder whose
> window/timestamp/paneinfo values are specified as constants in the
> coder.
>
> Let's call this NewValueOnlyWindowedValueCoder, and is parameterized
> by a window, timestamp, and pane info instance
>
> The existing ValueOnlyWindowedValueCoder is literally
> NewValueOnlyWindowedValueCoder(GlobalWindow, MIN_TIMESTAMP,
> PaneInfo.NO_FIRING). Note in particular that using the existing
> ValueOnlyWindowedValueCoder would give the wrong timestamp and pane
> info if it is use for the result of a GBK, which I think is the loss
> of consistency referred to here.
>
> On Thu, Nov 7, 2019 at 1:03 AM jincheng sun <sunjincheng...@gmail.com>
> wrote:
> >
> > Thanks for your feedback and the valuable comments, Kenn & Robert!
> >
> > I think your comments are more comprehensive and enlighten me a lot. The
> two proposals which I mentioned above are to reuse the existing coder
> (FullWindowedValueCoder and ValueOnlyWindowedValueCoder). Now, with your
> comments, I think we can further abstract 'FullWindowedValueCoder' and
> 'ValueOnlyWindowedValueCoder', that is, we can rename
> 'FullWindowedValueCoder' to 'WindowedValueCoder', and make the coders of
> window/timestamp/pane configurable. Then we can remove
> 'ValueOnlyWindowedValueCoder' and need to add a mount of constant coders
> for window/timestamp/pane.
> >
> > I have replied your comments on the doc, and quick feedback as following:
> >
> > Regarding to "Approach 2: probably no SDK harness work / compatible with
> existing Beam model so no risk of introducing inconsistency",if we "just
> puts default window/timestamp/pane info on elements" and don't change the
> original coder, then the performance is not optimized. If we want to get
> the best performance, then the default coder of Window/timestamp/pane
> should be constant coder. In this case the SDK harnesses need to be aware
> of the constant coder and there will be some development work in the SDK
> harness. Besides, the SDK harness also needs to make the coders for
> window/timestamp/pane configurable and this will introduce some related
> changes, such as updating WindowedValueCoder._get_component_coders, etc.
> >
> > Regarding to "Approach 1: option a: if the SDK harness has to understand
> 'values without windows' then very large changes and high risk of
> introducing inconsistency (I eliminated many of these inconsistencies)", we
> only need to add ValueOnlyWindowedValueCoder to the StandardCoders and all
> the SDK harness should be aware of this coder. There is no much changes
> actually.
> >
> > Please feel free to correct me if there is anyting incorrect. :)
> >
> > Besides, I'm not quite clear about the consistency issues you meant
> here. Could you please give me some hints about this?
> >
> > Best,
> > Jincheng
> >
> > Robert Bradshaw <rober...@google.com> 于2019年11月7日周四 上午3:38写道:
> >>
> >> Yes, the portability framework is designed to support this, and
> >> possibly even more efficient transfers of data than element-by-element
> >> as per the wire coder specified in the IO port operators. I left some
> >> comments on the doc as well, and would also prefer approach 2.
> >>
> >> On Wed, Nov 6, 2019 at 11:03 AM Kenneth Knowles <k...@apache.org>
> wrote:
> >> >
> >> > I think the portability framework is designed for this. The runner
> controls the coder on the grpc ports and the runner controls the process
> bundle descriptor.
> >> >
> >> > I commented on the doc. I think what is missing is analysis of scope
> of SDK harness changes and risk to model consistency
> >> >
> >> >     Approach 2: probably no SDK harness work / compatible with
> existing Beam model so no risk of introducing inconsistency
> >> >
> >> >     Approach 1: what are all the details?
> >> >         option a: if the SDK harness has to understand "values
> without windows" then very large changes and high risk of introducing
> inconsistency (I eliminated many of these inconsistencies)
> >> >         option b: if the coder just puts default
> window/timestamp/pane info on elements, then it is the same as approach 2,
> no work / no risk
> >> >
> >> > Kenn
> >> >
> >> > On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <sunjincheng...@gmail.com>
> wrote:
> >> >>
> >> >> Hi all,
> >> >>
> >> >> I am trying to make some improvements of portability framework to
> make it usable in other projects. However, we find that the coder between
> runner and harness can only be FullWindowedValueCoder. This means each time
> when sending a WindowedValue, we have to encode/decode timestamp, windows
> and pan infos. In some circumstances(such as using the portability
> framework in Flink), only values are needed between runner and harness. So,
> it would be nice if we can configure the coder and avoid redundant encoding
> and decoding between runner and harness to improve the performance.
> >> >>
> >> >> There are two approaches to solve this issue:
> >> >>
> >> >>     Approach 1:  Support ValueOnlyWindowedValueCoder between runner
> and harness.
> >> >>     Approach 2:  Add a "constant" window coder that embeds all the
> windowing information as part of the coder that should be used to wrap the
> value during decoding.
> >> >>
> >> >> More details can be found here [1].
> >> >>
> >> >> As of the shortcomings of “Approach 2” which still need to
> encode/decode timestamp and pane infos, we tend to choose “Approach 1”
> which brings better performance and is more thorough.
> >> >>
> >> >> Welcome any feedback :)
> >> >>
> >> >> Best,
> >> >> Jincheng
> >> >>
> >> >> [1]
> https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing
> >> >>
>

Reply via email to