Thanks for your feedback and the valuable comments, Kenn & Robert!

I think your comments are more comprehensive and enlighten me a lot. The
two proposals which I mentioned above are to reuse the existing coder
(FullWindowedValueCoder and ValueOnlyWindowedValueCoder). Now, with your
comments, I think we can further abstract 'FullWindowedValueCoder' and
'ValueOnlyWindowedValueCoder', that is, we can rename
'FullWindowedValueCoder' to 'WindowedValueCoder', and make the coders of
window/timestamp/pane configurable. Then we can remove
'ValueOnlyWindowedValueCoder' and need to add a mount of constant coders
for window/timestamp/pane.

I have replied your comments on the doc, and quick feedback as following:

Regarding to "Approach 2: probably no SDK harness work / compatible with
existing Beam model so no risk of introducing inconsistency",if we "just
puts default window/timestamp/pane info on elements" and don't change the
original coder, then the performance is not optimized. If we want to get
the best performance, then the default coder of Window/timestamp/pane
should be constant coder. In this case the SDK harnesses need to be aware
of the constant coder and there will be some development work in the SDK
harness. Besides, the SDK harness also needs to make the coders for
window/timestamp/pane configurable and this will introduce some related
changes, such as updating WindowedValueCoder._get_component_coders, etc.

Regarding to "Approach 1: option a: if the SDK harness has to understand
'values without windows' then very large changes and high risk of
introducing inconsistency (I eliminated many of these inconsistencies)", we
only need to add ValueOnlyWindowedValueCoder to the StandardCoders and all
the SDK harness should be aware of this coder. There is no much changes
actually.

Please feel free to correct me if there is anyting incorrect. :)

Besides, I'm not quite clear about the consistency issues you meant here.
Could you please give me some hints about this?

Best,
Jincheng

Robert Bradshaw <rober...@google.com> 于2019年11月7日周四 上午3:38写道:

> Yes, the portability framework is designed to support this, and
> possibly even more efficient transfers of data than element-by-element
> as per the wire coder specified in the IO port operators. I left some
> comments on the doc as well, and would also prefer approach 2.
>
> On Wed, Nov 6, 2019 at 11:03 AM Kenneth Knowles <k...@apache.org> wrote:
> >
> > I think the portability framework is designed for this. The runner
> controls the coder on the grpc ports and the runner controls the process
> bundle descriptor.
> >
> > I commented on the doc. I think what is missing is analysis of scope of
> SDK harness changes and risk to model consistency
> >
> >     Approach 2: probably no SDK harness work / compatible with existing
> Beam model so no risk of introducing inconsistency
> >
> >     Approach 1: what are all the details?
> >         option a: if the SDK harness has to understand "values without
> windows" then very large changes and high risk of introducing inconsistency
> (I eliminated many of these inconsistencies)
> >         option b: if the coder just puts default window/timestamp/pane
> info on elements, then it is the same as approach 2, no work / no risk
> >
> > Kenn
> >
> > On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <sunjincheng...@gmail.com>
> wrote:
> >>
> >> Hi all,
> >>
> >> I am trying to make some improvements of portability framework to make
> it usable in other projects. However, we find that the coder between runner
> and harness can only be FullWindowedValueCoder. This means each time when
> sending a WindowedValue, we have to encode/decode timestamp, windows and
> pan infos. In some circumstances(such as using the portability framework in
> Flink), only values are needed between runner and harness. So, it would be
> nice if we can configure the coder and avoid redundant encoding and
> decoding between runner and harness to improve the performance.
> >>
> >> There are two approaches to solve this issue:
> >>
> >>     Approach 1:  Support ValueOnlyWindowedValueCoder between runner and
> harness.
> >>     Approach 2:  Add a "constant" window coder that embeds all the
> windowing information as part of the coder that should be used to wrap the
> value during decoding.
> >>
> >> More details can be found here [1].
> >>
> >> As of the shortcomings of “Approach 2” which still need to
> encode/decode timestamp and pane infos, we tend to choose “Approach 1”
> which brings better performance and is more thorough.
> >>
> >> Welcome any feedback :)
> >>
> >> Best,
> >> Jincheng
> >>
> >> [1]
> https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing
> >>
>

Reply via email to