lincoln-lil commented on PR #20745: URL: https://github.com/apache/flink/pull/20745#issuecomment-1324799135
> @lincoln-lil I see, I think it is a dilemma between correctness and performance. With the filter push down that corrupt the versioned table, the correctness of the query result might be affected. However, the expensive Changelog Normalization does great harm to the performance. At my point of view, we should firstly assure the correctness. @shuiqiangchen This is not a trade-off between correctness and performance (since preventing filter push down to the right table already solves the correctness problem) Rather, this deterministic optimization (only pushing down the filter to the left table only) will have a performance badcase (upsert source adds an additional high-overhead materialization node ChangelogNormalize) Let's look at a query(Orders table is a upsert-kafka): ```sql SELECT * FROM Orders AS o JOIN rates_last_row_rowtime FOR SYSTEM_TIME AS OF o.rowtime AS r ON o.currency = r.currency WHERE o.amount > 10 AND r.rate > 1 ``` the filter conditions followed by the join operation related to both left and right side, with this pr, only the left predicate 'o.amount > 10' will be pushed down and this may cause the optimal upsert mode degrade to retract mode. So the benefit of this optimization is uncertain, and furthermore, if we want to identify the patterns with performance degradation based on this deterministic optimization and then do de-optimization(e.g., pull up filters and remove ChangelogNormalize...), it will bring much more complexity, so after discussed all of these issues, @godfreyhe and I both tend to go back to the simple solution that all the filters does not push down to eventtime temporal join. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
