Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

Robert Bradshaw Wed, 06 Nov 2019 11:39:14 -0800

Yes, the portability framework is designed to support this, and
possibly even more efficient transfers of data than element-by-element
as per the wire coder specified in the IO port operators. I left some
comments on the doc as well, and would also prefer approach 2.


On Wed, Nov 6, 2019 at 11:03 AM Kenneth Knowles <k...@apache.org> wrote:
>
> I think the portability framework is designed for this. The runner controls 
> the coder on the grpc ports and the runner controls the process bundle 
> descriptor.
>
> I commented on the doc. I think what is missing is analysis of scope of SDK 
> harness changes and risk to model consistency
>
>     Approach 2: probably no SDK harness work / compatible with existing Beam 
> model so no risk of introducing inconsistency
>
>     Approach 1: what are all the details?
>         option a: if the SDK harness has to understand "values without 
> windows" then very large changes and high risk of introducing inconsistency 
> (I eliminated many of these inconsistencies)
>         option b: if the coder just puts default window/timestamp/pane info 
> on elements, then it is the same as approach 2, no work / no risk
>
> Kenn
>
> On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <sunjincheng...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I am trying to make some improvements of portability framework to make it 
>> usable in other projects. However, we find that the coder between runner and 
>> harness can only be FullWindowedValueCoder. This means each time when 
>> sending a WindowedValue, we have to encode/decode timestamp, windows and pan 
>> infos. In some circumstances(such as using the portability framework in 
>> Flink), only values are needed between runner and harness. So, it would be 
>> nice if we can configure the coder and avoid redundant encoding and decoding 
>> between runner and harness to improve the performance.
>>
>> There are two approaches to solve this issue:
>>
>>     Approach 1:  Support ValueOnlyWindowedValueCoder between runner and 
>> harness.
>>     Approach 2:  Add a "constant" window coder that embeds all the windowing 
>> information as part of the coder that should be used to wrap the value 
>> during decoding.
>>
>> More details can be found here [1].
>>
>> As of the shortcomings of “Approach 2” which still need to encode/decode 
>> timestamp and pane infos, we tend to choose “Approach 1” which brings better 
>> performance and is more thorough.
>>
>> Welcome any feedback :)
>>
>> Best,
>> Jincheng
>>
>> [1] 
>> https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing
>>

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

Reply via email to