Thanks Qingsheng for the proposal. +1 for the change.

On Fri, Jan 13, 2023 at 11:32 AM Qingsheng Ren <re...@apache.org> wrote:

> Hi devs,
>
> I’d like to start a discussion about enabling the dynamic partition
> discovery feature by default in Kafka source. Dynamic partition discovery
> [1] is a useful feature in Kafka source especially under the scenario when
> the consuming Kafka topic scales out, or the source subscribes to multiple
> Kafka topics with a pattern. Users don’t have to restart the Flink job to
> consume messages in the new partition with this feature enabled. Currently,
> dynamic partition discovery is disabled by default and users have to
> explicitly specify the interval of discovery in order to turn it on.
>
> # Breaking changes
>
> For Kafka table source:
>
> - “scan.topic-partition-discovery.interval” will be set to 30 seconds by
> default.
> - As we need to provide a way for users to disable the feature,
> “scan.topic-partition-discovery.interval” = “0” will be used to turn off
> the discovery. Before this proposal, “0” means to enable partition
> discovery with interval = 0, which is a bit senseless in practice.
> Unfortunately we can't use negative values as the type of this option is
> Duration.
>
> For KafkaSource (DataStream API)
>
> - Dynamic partition discovery in Kafka source will be enabled by default,
> with discovery interval set to 30 seconds.
> - To align with table source, only a positive value for option “
> partition.discovery.interval.ms” could be used to specify the discovery
> interval. Both negative and zero will be interpreted as disabling the
> feature.
>
> # Overhead of partition discovery
>
> Partition discovery is made on KafkaSourceEnumerator, which asynchronously
> fetches topic metadata from Kafka cluster and checks if there’s any new
> topic and partition. This shouldn’t introduce performance issues on the
> Flink side.
>
> On the Kafka broker side, partition discovery makes MetadataRequest to
> Kafka broker for fetching topic infos. Considering Kafka broker has its
> metadata cache and the default request frequency is relatively low (per 30
> seconds), this is not a heavy operation and the performance of the broker
> won’t be affected a lot. It'll also be great to get some inputs from Kafka
> experts.
>
> Looking forward to your feedback!
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka/#dynamic-partition-discovery
>
> Best regards,
> Qingsheng
>

Reply via email to