[ 
https://issues.apache.org/jira/browse/KAFKA-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lianet Magrans updated KAFKA-15832:
-----------------------------------
    Description: 
Currently the reconciliation logic on the client is triggered when a new target 
assignment is received and resolved, or when new unresolved target assignments 
are discovered in metadata.

This could be improved by triggering the reconciliation logic on each poll 
iteration, to reconcile whatever is ready to be reconciled. This would require 
changes to support poll on the MembershipManager, and integrate it with the 
current polling logic in the background thread. Receiving a new target 
assignment from the broker, or resolving new topic names via a metadata update 
could only ensure that the #assignmentReadyToReconcile are properly updated 
(currently done), but wouldn't trigger the #reconcile() logic, leaving that to 
the #poll() operation.

As a result of this task, we should validate that the client always reconciles 
whatever it has pending that has not been removed by the coordinator. This 
should address edge cases where a client might get stuck JOINING/RECONCILING, 
with a pending reconciliation, where null assignments are exchanged between the 
client and the coordinator, while the long-running reconciliation completes. 
Note that currently, the MembershipManager relies on assignment != null to 
trigger the reconciliation of pending assignments. With the current logic, the 
following sequence would let the client stuck JOINING:
- client joins, epoch 0
- client receives assignment tp1, stuck RECONCILING, epoch 1
- member gets FENCED on the coord, coord bumps epoch to 2
- client tries to rejoin (JOINING), epoch 0 provided by the client
- new member added to the group (group epoch bumped to 3), client receives same 
assignment that is currently trying to reconcile (tp1), but with epoch 3
- previous reconciliation completes, but will discard the result because it 
will notice that the memberHasRejoined (memberEpochOnReconciliationStart != 
memberEpoch). Client is stuck JOINING, with the server sending null target 
assignment because it hasn't changed since the last one sent (tp1)

  was:
Currently the reconciliation logic on the client is triggered when a new target 
assignment is received and resolved, or when new unresolved target assignments 
are discovered in metadata.

This could be improved by triggering the reconciliation logic on each poll 
iteration, to reconcile whatever is ready to be reconciled. This would require 
changes to support poll on the MembershipManager, and integrate it with the 
current polling logic in the background thread. Receiving a new target 
assignment from the broker, or resolving new topic names via a metadata update 
could only ensure that the #assignmentReadyToReconcile are properly updated 
(currently done), but wouldn't trigger the #reconcile() logic, leaving that to 
the #poll() operation.

As a result of this task, we should validate that the client always reconciles 
whatever it has pending that has not been removed by the coordinator. This 
should address edge cases where a client might get stuck JOINING/RECONCILING, 
with a pending reconciliation, where null assignments are exchanged between the 
client and the coordinator, while the long-running reconciliation completes. 
Note that currently, the MembershipManager relies on assignment != null to 
trigger the reconciliation of pending assignments. With the current logic, the 
following sequence would let the client stuck JOINING:
- client joins, epoch 0
- client receives assignment tp1, stuck RECONCILING, epoch 1
- member gets FENCED on the coord, coord bumps epoch to 2
- client tries to rejoin (JOINING), epoch 0 provided by the client
- client added to the group (group epoch bumped to 2), client receives same 
assignment that is currently trying to reconcile (tp1)
- reconciliation completes, will discard the reconciliation result if it 
completes after the fencing, because it will notice that the memberHasRejoined 
(memberEpochOnReconciliationStart != memberEpoch).


> Trigger client reconciliation based on manager poll
> ---------------------------------------------------
>
>                 Key: KAFKA-15832
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15832
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: clients, consumer
>            Reporter: Lianet Magrans
>            Assignee: Lianet Magrans
>            Priority: Major
>              Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
>             Fix For: 3.8.0
>
>
> Currently the reconciliation logic on the client is triggered when a new 
> target assignment is received and resolved, or when new unresolved target 
> assignments are discovered in metadata.
> This could be improved by triggering the reconciliation logic on each poll 
> iteration, to reconcile whatever is ready to be reconciled. This would 
> require changes to support poll on the MembershipManager, and integrate it 
> with the current polling logic in the background thread. Receiving a new 
> target assignment from the broker, or resolving new topic names via a 
> metadata update could only ensure that the #assignmentReadyToReconcile are 
> properly updated (currently done), but wouldn't trigger the #reconcile() 
> logic, leaving that to the #poll() operation.
> As a result of this task, we should validate that the client always 
> reconciles whatever it has pending that has not been removed by the 
> coordinator. This should address edge cases where a client might get stuck 
> JOINING/RECONCILING, with a pending reconciliation, where null assignments 
> are exchanged between the client and the coordinator, while the long-running 
> reconciliation completes. Note that currently, the MembershipManager relies 
> on assignment != null to trigger the reconciliation of pending assignments. 
> With the current logic, the following sequence would let the client stuck 
> JOINING:
> - client joins, epoch 0
> - client receives assignment tp1, stuck RECONCILING, epoch 1
> - member gets FENCED on the coord, coord bumps epoch to 2
> - client tries to rejoin (JOINING), epoch 0 provided by the client
> - new member added to the group (group epoch bumped to 3), client receives 
> same assignment that is currently trying to reconcile (tp1), but with epoch 3
> - previous reconciliation completes, but will discard the result because it 
> will notice that the memberHasRejoined (memberEpochOnReconciliationStart != 
> memberEpoch). Client is stuck JOINING, with the server sending null target 
> assignment because it hasn't changed since the last one sent (tp1)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to