Hi Xuannan, Hi Dong Thanks for the Proposal! After reading the FLIP, I'd like to ask some questions:
1. Naming convention for boolean variables. It is recommended to follow JavaBean [1], i.e. objectReuseCompliant as the variable name with isObjectReuseCompliant() and setObjectReuseCompliant() as the methods' name. 2. - *If pipeline.object-reuse is set to true, records emitted by this operator will be re-used.* - *Otherwise, if getIsObjectReuseCompliant() returns true, records emitted by this operator will be re-used.* - *Otherwise, records emitted by this operator will be deep-copied before being given to the next operator in the chain.* If I understand you correctly, the hard coding objectReusedCompliant should have higher priority over the configuration, the checking logic should be: - *If getIsObjectReuseCompliant() returns true, records emitted by this operator will be re-used.* - *Otherwise, if pipeline.object-reuse is set to true, records emitted by this operator will be re-used.* - *Otherwise, records emitted by this operator will be deep-copied before being given to the next operator in the chain.* The results are the same but the checking logics are different, but there are some additional thoughts, which lead us to the next question. 3. Current design lets specific operators enable object reuse and ignore the global config. There could be another thought, on the contrary: if an operator has hard coded the objectReuseCompliant as false, i.e. disable object reuse on purpose, records should not be reused even if the global config pipeline.object-reused is set to true, which turns out that the objectReuseCompliant could be a triple value logic: ture: force object reusing; false: force deep-copying; unknown: depends on pipeline.object-reuse config. Best regards, Jing [1] https://en.wikipedia.org/wiki/JavaBeans On Mon, Jul 3, 2023 at 4:25 AM Xuannan Su <suxuanna...@gmail.com> wrote: > Hi all, > > Dong(cc'ed) and I are opening this thread to discuss our proposal to > add operator attribute to allow operator to specify support for > object-reuse [1]. > > Currently, the default configuration for pipeline.object-reuse is set > to false to avoid data corruption, which can result in suboptimal > performance. We propose adding APIs that operators can utilize to > inform the Flink runtime whether it is safe to reuse the emitted > records. This enhancement would enable Flink to maximize its > performance using the default configuration. > > Please refer to the FLIP document for more details about the proposed > design and implementation. We welcome any feedback and opinions on > this proposal. > > Best regards, > > Dong and Xuannan > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749 >