zhangjun0x01 opened a new pull request #1936:
URL: https://github.com/apache/iceberg/pull/1936
When using flink to query the iceberg table, the parallelism is the default
parallelism of flink, but the number of datafiles on iceberg table is
different. The user do not know how much parallelism should be used, and
setting a too large parallelism will cause resource waste, setting the
parallelism too small will cause the query to be slow, so we can add
parallelism infer.
The function is enabled by default. the parallelism is equal to the number
of data files. Of course, the user can manually turn off the infer function. In
order to prevent too many datafiles from causing excessive parallelism, we also
set a max infer parallelism. When the infer parallelism exceeds the setting,
use the max parallelism.
In addition, we also need to compare with the limit in the `select` query
statement to get a more appropriate parallelism in the case of limit pushdown,
for example we have a sql `select * from table limit 1`, and finally we infer
the parallelism is 10, but we only one parallel is needed , besause we only
need a data .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]