dajac commented on PR #12897:
URL: https://github.com/apache/kafka/pull/12897#issuecomment-1330712535

   > A specific example is when a partition is reassigned. the consumer will 
get NOT_LEADER_OR_FOLLOWER which triggers a metadata update but the preferred 
read replica will not be refreshed as the follower is still online. it will 
continue to reach out to the old follower until the preferred read replica 
expires.
   
   Is this really true? The replica selection is done as follow when the fetch 
request is constructed.
   
   ```
       Node selectReadReplica(TopicPartition partition, Node leaderReplica, 
long currentTimeMs) {
           Optional<Integer> nodeId = 
subscriptions.preferredReadReplica(partition, currentTimeMs);
           if (nodeId.isPresent()) {
               Optional<Node> node = nodeId.flatMap(id -> 
metadata.fetch().nodeIfOnline(partition, id));
               if (node.isPresent()) {
                   return node.get();
               } else {
                   log.trace("Not fetching from {} for partition {} since it is 
marked offline or is missing from our metadata," +
                             " using the leader instead.", nodeId, partition);
                   subscriptions.clearPreferredReadReplica(partition);
                   return leaderReplica;
               }
           } else {
               return leaderReplica;
           }
       }
   ```
   
   And this selection process resets the preferred read replica if 
`metadata.fetch().nodeIfOnline(partition, id)` returns an empty optional.
   
   ```
       public Optional<Node> nodeIfOnline(TopicPartition partition, int id) {
           Node node = nodeById(id);
           PartitionInfo partitionInfo = partition(partition);
           if (node != null && partitionInfo != null && 
!Arrays.asList(partitionInfo.offlineReplicas()).contains(node)) {
               return Optional.of(node);
           } else {
               return Optional.empty();
           }
       }
   ```
   
   I guess that the issue is in `nodeIfOnline` which only look into offline 
replicas. When a replica is moved to another broker, it is not in the offline 
replicas any longer. `nodeIfOnline` should also consider wether the node is a 
valid replica at all. Should we also fix this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to