[
https://issues.apache.org/jira/browse/KAFKA-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lianet Magrans updated KAFKA-15832:
-----------------------------------
Description:
Currently the reconciliation logic on the client is triggered when a new target
assignment is received and resolved, or when new unresolved target assignments
are discovered in metadata.
This could be improved by triggering the reconciliation logic on each poll
iteration, to reconcile whatever is ready to be reconciled. This would require
changes to support poll on the MembershipManager, and integrate it with the
current polling logic in the background thread. Receiving a new target
assignment from the broker, or resolving new topic names via a metadata update
could only ensure that the #assignmentReadyToReconcile are properly updated
(currently done), but wouldn't trigger the #reconcile() logic, leaving that to
the #poll() operation.
As a result of this task, we should validate that the client always reconciles
whatever it has pending that has not been removed by the coordinator. This
should address edge cases where a client might get stuck JOINING/RECONCILING,
with a pending reconciliation, where null assignments are exchanged between the
client and the coordinator, while the long-running reconciliation completes.
Note that currently, the MembershipManager relies on assignment != null to
trigger the reconciliation of pending assignments. With the current logic, the
following sequence would let the client stuck JOINING:
- client joins, epoch 0
- client receives assignment tp1, stuck RECONCILING, epoch 1
- member gets FENCED on the coord, coord bumps epoch to 2
- client tries to rejoin (JOINING), epoch 0 provided by the client
- client added to the group (group epoch bumped to 2), client receives same
assignment that is currently trying to reconcile (tp1)
- reconciliation completes, will discard the reconciliation result if it
completes after the fencing, because it will notice that the memberHasRejoined
(memberEpochOnReconciliationStart != memberEpoch).
was:
Currently the reconciliation logic on the client is triggered when a new target
assignment is received and resolved, or when new unresolved target assignments
are discovered in metadata.
This could be improved by triggering the reconciliation logic on each poll
iteration, to reconcile whatever is ready to be reconciled. This would required
changes to support poll on the MembershipManager, and integrate it with the
current polling logic in the background thread.
As a result of this task, it should be ensured that the client always
reconciles whatever it has pending that has not been removed by the
coordinator. (This should address edge cases where a client might get stuck
JOINING/RECONCILING, with a pending reconciliation, where null assignments are
exchanged between the client and the coordinator, while the long-running
reconciliation completes. Note that currently, the MembershipManager relies on
assignment != null to trigger the reconciliation of pending assignments)
> Trigger client reconciliation based on manager poll
> ---------------------------------------------------
>
> Key: KAFKA-15832
> URL: https://issues.apache.org/jira/browse/KAFKA-15832
> Project: Kafka
> Issue Type: Sub-task
> Components: clients, consumer
> Reporter: Lianet Magrans
> Assignee: Lianet Magrans
> Priority: Major
> Labels: kip-848, kip-848-client-support, kip-848-e2e,
> kip-848-preview
> Fix For: 3.8.0
>
>
> Currently the reconciliation logic on the client is triggered when a new
> target assignment is received and resolved, or when new unresolved target
> assignments are discovered in metadata.
> This could be improved by triggering the reconciliation logic on each poll
> iteration, to reconcile whatever is ready to be reconciled. This would
> require changes to support poll on the MembershipManager, and integrate it
> with the current polling logic in the background thread. Receiving a new
> target assignment from the broker, or resolving new topic names via a
> metadata update could only ensure that the #assignmentReadyToReconcile are
> properly updated (currently done), but wouldn't trigger the #reconcile()
> logic, leaving that to the #poll() operation.
> As a result of this task, we should validate that the client always
> reconciles whatever it has pending that has not been removed by the
> coordinator. This should address edge cases where a client might get stuck
> JOINING/RECONCILING, with a pending reconciliation, where null assignments
> are exchanged between the client and the coordinator, while the long-running
> reconciliation completes. Note that currently, the MembershipManager relies
> on assignment != null to trigger the reconciliation of pending assignments.
> With the current logic, the following sequence would let the client stuck
> JOINING:
> - client joins, epoch 0
> - client receives assignment tp1, stuck RECONCILING, epoch 1
> - member gets FENCED on the coord, coord bumps epoch to 2
> - client tries to rejoin (JOINING), epoch 0 provided by the client
> - client added to the group (group epoch bumped to 2), client receives same
> assignment that is currently trying to reconcile (tp1)
> - reconciliation completes, will discard the reconciliation result if it
> completes after the fencing, because it will notice that the
> memberHasRejoined (memberEpochOnReconciliationStart != memberEpoch).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)