Thanks Qingsheng for the proposal. +1 for the change. On Fri, Jan 13, 2023 at 11:32 AM Qingsheng Ren <re...@apache.org> wrote:
> Hi devs, > > I’d like to start a discussion about enabling the dynamic partition > discovery feature by default in Kafka source. Dynamic partition discovery > [1] is a useful feature in Kafka source especially under the scenario when > the consuming Kafka topic scales out, or the source subscribes to multiple > Kafka topics with a pattern. Users don’t have to restart the Flink job to > consume messages in the new partition with this feature enabled. Currently, > dynamic partition discovery is disabled by default and users have to > explicitly specify the interval of discovery in order to turn it on. > > # Breaking changes > > For Kafka table source: > > - “scan.topic-partition-discovery.interval” will be set to 30 seconds by > default. > - As we need to provide a way for users to disable the feature, > “scan.topic-partition-discovery.interval” = “0” will be used to turn off > the discovery. Before this proposal, “0” means to enable partition > discovery with interval = 0, which is a bit senseless in practice. > Unfortunately we can't use negative values as the type of this option is > Duration. > > For KafkaSource (DataStream API) > > - Dynamic partition discovery in Kafka source will be enabled by default, > with discovery interval set to 30 seconds. > - To align with table source, only a positive value for option “ > partition.discovery.interval.ms” could be used to specify the discovery > interval. Both negative and zero will be interpreted as disabling the > feature. > > # Overhead of partition discovery > > Partition discovery is made on KafkaSourceEnumerator, which asynchronously > fetches topic metadata from Kafka cluster and checks if there’s any new > topic and partition. This shouldn’t introduce performance issues on the > Flink side. > > On the Kafka broker side, partition discovery makes MetadataRequest to > Kafka broker for fetching topic infos. Considering Kafka broker has its > metadata cache and the default request frequency is relatively low (per 30 > seconds), this is not a heavy operation and the performance of the broker > won’t be affected a lot. It'll also be great to get some inputs from Kafka > experts. > > Looking forward to your feedback! > > [1] > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka/#dynamic-partition-discovery > > Best regards, > Qingsheng >