xishuaidelin commented on code in PR #24240: URL: https://github.com/apache/flink/pull/24240#discussion_r1477643284
########## docs/content/docs/dev/table/tuning.md: ########## @@ -266,5 +266,23 @@ GROUP BY day Flink SQL optimizer can recognize the different filter arguments on the same distinct key. For example, in the above example, all the three COUNT DISTINCT are on `user_id` column. Then Flink can use just one shared state instance instead of three state instances to reduce state access and state size. In some workloads, this can get significant performance improvements. +## MiniBatch Join + +By default, regular join operator processes input records one by one, i.e., (1) look up records from state according to joinKey, (2) write or retract input in state, (3) process the input and joined records. This processing pattern may increase the overhead of StateBackend (especially for RocksDB StateBackend). + +The core idea of mini-batch join is to cache a bundle of inputs in a buffer inside of the mini-batch join operator. Reduce data in the cache, and then when the cache is triggered for processing, perform specific optimizations based on certain scenarios. Some of input records would be folded according to specified rule illustrated below: Review Comment: A new graph to clarify the principle is introduced. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
