Hi Xuannan, Hi Dong

Thanks for the Proposal! After reading the FLIP, I'd like to ask some
questions:

1. Naming convention for boolean variables. It is recommended to follow
JavaBean [1], i.e. objectReuseCompliant as the variable name with
isObjectReuseCompliant() and setObjectReuseCompliant() as the methods' name.


2.

   -

   *If pipeline.object-reuse is set to true, records emitted by this
   operator will be re-used.*
   -

   *Otherwise, if getIsObjectReuseCompliant() returns true, records emitted
   by this operator will be re-used.*
   -

   *Otherwise, records emitted by this operator will be deep-copied before
   being given to the next operator in the chain.*


If I understand you correctly,  the hard coding objectReusedCompliant
should have higher priority over the configuration, the checking logic
should be:

   -

   *If getIsObjectReuseCompliant() returns true, records emitted by this
   operator will be re-used.*
   -

   *Otherwise, if pipeline.object-reuse is set to true, records emitted by
   this operator will be re-used.*
   -

   *Otherwise, records emitted by this operator will be deep-copied before
   being given to the next operator in the chain.*


The results are the same but the checking logics are different, but there
are some additional thoughts, which lead us to the next question.

3. Current design lets specific operators enable object reuse and ignore
the global config. There could be another thought, on the contrary: if an
operator has hard coded the objectReuseCompliant as false, i.e. disable
object reuse on purpose, records should not be reused even if the global
config pipeline.object-reused is set to true, which turns out that the
objectReuseCompliant could be a triple value logic: ture: force object
reusing; false: force deep-copying; unknown: depends on
pipeline.object-reuse config.


Best regards,
Jing


[1] https://en.wikipedia.org/wiki/JavaBeans

On Mon, Jul 3, 2023 at 4:25 AM Xuannan Su <suxuanna...@gmail.com> wrote:

> Hi all,
>
> Dong(cc'ed) and I are opening this thread to discuss our proposal to
> add operator attribute to allow operator to specify support for
> object-reuse [1].
>
> Currently, the default configuration for pipeline.object-reuse is set
> to false to avoid data corruption, which can result in suboptimal
> performance. We propose adding APIs that operators can utilize to
> inform the Flink runtime whether it is safe to reuse the emitted
> records. This enhancement would enable Flink to maximize its
> performance using the default configuration.
>
> Please refer to the FLIP document for more details about the proposed
> design and implementation. We welcome any feedback and opinions on
> this proposal.
>
> Best regards,
>
> Dong and Xuannan
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749
>

Reply via email to