Hi everyone, I would like to open a discussion on adding a new query hint for async lookup join. Since the changes were relatively minor, no new flip was created, and if needed, I will create one.
FLINK-27623 adds a global parameter 'table.exec.async-lookup.output-mode' for table users so that all three control parameters related to async I/O can be configured at the same job level. As planned in the issue, we‘d like to go a step further to offer more precise control for async join operation more than job level config, to introduce a new join hint: ‘ASYNC_LOOKUP’. For the hint option, for intuitive and user-friendly reasons, we want to support both simple and kv forms, with all options except table name being optional (use job level configuration if not set) # 1. simple form: (ordered hint option list) ``` ASYNC_LOOKUP('tableName'[, 'output-mode', 'buffer-capacity', 'timeout']) optional: output-mode buffer-capacity timeout ``` Note: since Calcite currently does not support the mixed type hint options, the table name here needs to be a string instead of an identifier. (For `SqlHint`: The option format can not be mixed in, they should either be all simple identifiers or all literals or all key value pairs.) We can improve this after Calcite support. # 2. kv form: (support unordered hint option list) ``` ASYNC_LOOKUP('table'='tableName'[, 'output-mode'='ordered|allow-unordered', 'capacity'='int', 'timeout'='duration']) optional kvs: 'output-mode'='ordered|allow-unordered' 'capacity'='int' 'timeout'='duration' ``` e.g., if the job level configuration is: ``` table.exec.async-lookup.output-mode: ORDERED table.exec.async-lookup.buffer-capacity: 100 table.exec.async-lookup.timeout: 180s ``` then the following hints: ``` 1. ASYNC_LOOKUP('dim1', 'allow-unordered', '200', '300s') 2. ASYNC_LOOKUP('dim1', 'allow-unordered', '200') 3. ASYNC_LOOKUP('table'='dim1', 'output-mode'='allow-unordered') 4. ASYNC_LOOKUP('table'='dim1', 'timeout'='300s') 5. ASYNC_LOOKUP('table'='dim1', 'capacity'='300') ``` are equivalent to: ``` 1. ASYNC_LOOKUP('dim1', 'allow-unordered', '200', '300s') 2. ASYNC_LOOKUP('dim1', 'allow-unordered', '200', '180s') 3. ASYNC_LOOKUP('table'='dim1', 'output-mode'='allow-unordered', 'capacity'='100', 'timeout'='180s') 4. ASYNC_LOOKUP('table'='dim1', 'output-mode'='ordered', 'capacity'='100', 'timeout'='300s') 5. ASYNC_LOOKUP('table'='dim1', 'output-mode'='ordered', 'capacity'='300', 'timeout'='180s') ``` In addition, if the lookup source implements both sync and async table function, the planner prefers to choose the async function when the 'ASYNC_LOOKUP' hint is specified. Looking forward to your feedback! Best, Lincoln Lee