lincoln-lil commented on PR #20745:
URL: https://github.com/apache/flink/pull/20745#issuecomment-1324799135

   > @lincoln-lil I see, I think it is a dilemma between correctness and 
performance. With the filter push down that corrupt the versioned table, the 
correctness of the query result might be affected. However, the expensive 
Changelog Normalization does great harm to the performance. At my point of 
view, we should firstly assure the correctness.
   
   @shuiqiangchen This is not a trade-off between correctness and performance 
(since preventing filter push down to the right table already solves the 
correctness problem)
   Rather, this deterministic optimization (only pushing down the filter to the 
left table only) will have a performance badcase (upsert source adds an 
additional high-overhead materialization node ChangelogNormalize)
   
   Let's look at a query(Orders table is a upsert-kafka):
   ```sql
   SELECT *      
   FROM Orders AS o JOIN
   rates_last_row_rowtime FOR SYSTEM_TIME AS OF o.rowtime AS r
   ON o.currency = r.currency
   WHERE o.amount > 10 AND r.rate > 1
   ```
   the filter conditions followed by the join operation related to both left 
and right side, with this pr,  only the left predicate 'o.amount > 10' will be 
pushed down and this may cause the optimal upsert mode degrade to retract mode.
   So the benefit of this optimization is uncertain, and furthermore, if we 
want to identify the patterns with performance degradation based on this 
deterministic optimization and then do de-optimization(e.g., pull up filters 
and remove ChangelogNormalize...), it will bring much more complexity, so after 
discussed all of these issues, @godfreyhe and I both tend to go back to the 
simple solution that all the filters does not push down to eventtime temporal 
join. WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to