Hi all,

I would like to propose flipping the default value of Kafka offset fetching
config. The context is following:

Before Spark 3.1, there was only one approach on fetching offset, using
consumer.poll(0). This has been pointed out as a root cause for hang since
there is no timeout for metadata fetch.

In Spark 3.1, we addressed this via introducing a new approach on fetching
offset, via SPARK-32032 <https://issues.apache.org/jira/browse/SPARK-32032>.
Since the new approach leverages AdminClient and consumer group is no
longer needed for fetching offset, required security ACLs are loosen.

Reference:
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#offset-fetching

There was some concern about behavioral change on the security model hence
we couldn't make the new approach by default.

During the time, we have observed various Kafka connector related issues
which came from old offset fetching (e.g. hang, issues on rebalance on
customer group, etc.) and we fixed many of these issues via simply flipping
the config.

Based on this, I would consider the default value as "incorrect". The
security-related behavioral change would be introduced inevitably (they can
set topic based ACL rule), but most people will get benefited. IMHO this is
something we can deal with release/migration note.

Would like to hear the voices on this.

Thanks,
Jungtaek Lim (HeartSaVioR)

Reply via email to