[ 
https://issues.apache.org/jira/browse/KAFKA-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274657#comment-14274657
 ] 

Joe Stein commented on KAFKA-1850:
----------------------------------

The reassignment isn't going to be able to finish until the new replica(s) 
is/are caught up.

Are all of your brokers up? How much data is in your partitions? 

ERROR: Assigned replicas (2,1,0) don't match the list of replicas for 
reassignment (2,1) for partition [testingTopic,9]

This means that replica #1 has not replicated everything and caught up to #2 
yet (the leader).

It is possible that the reassignment is still running but the replicas are just 
not catching up with the leader (so it is not finishing ever).  This could be 
due to data size and volume and threads (just can't keep up) with the broker 
configuration. This could be due to a different message max size on broker #0 
and #2 than #1 so you have a message that can't be fetched so it won't catch up.

Can you confirm, is there data in the partitions on the new broker? Do you see 
new data coming (you can look on disk at the directories)? 

It could be wedged/stuck and just not finishing.

One option is to restart the leader for each partition failing. I have seen 
that solve this issue before but I don't know if the problem you are having is 
in fact a bug or just the brokers simply not catching up.  It could be the 
controller also, so restarting broker#2 may end up being what you might have 
to-do to fix this.

I would investigate first to confirm that the issue is simply a problem of the 
new broker just not able to catch up and trying to resolve that before 
restarting brokers that are the leader and live as restarting them could have a 
negative impact to your cluster.


> Failed reassignment leads to additional replica
> -----------------------------------------------
>
>                 Key: KAFKA-1850
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1850
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8.1
>         Environment: CentOS  (Linux Kernel 2.6.32-71.el6.x86_64 )
>            Reporter: Alex Tian
>            Assignee: Neha Narkhede
>            Priority: Minor
>              Labels: newbie
>         Attachments: Track on testingTopic-9's movement.txt, 
> track_on_testingTopic-9_movement_on_the_following_2_days.txt
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When I start a topic reassignment (Totally 36 partitions) in my Kafka 
> Cluster, 24 partitions succeeded and 12 ones failed. However, the 12 failed 
> partitions have more replicas. I think the reason is that  AR still consists 
> of RAR and OAR although the reassignment for the partition failed. Could we 
> regard this problem as a bug? Quite sorry if any mistake in my question, 
> since I am a beginner for Kafka.
> This is the output from operation: 
> 1. alex-topics-to-move.json:
> {"topics": [{"topic": "testingTopic"}],
>  "version":1
> }
> 2. Generate a reassignment plan
> $./kafka-reassign-partitions.sh  --generate  --broker-list 0,1,2,3,4 
> --topics-to-move-json-file ./alex-topics-to-move.json   --zookeeper 
> 192.168.112.95:2181,192.168.112.96:2181,192.168.112.97:2181,192.168.112.98:2181,192.168.112.99:2181
> Current partition replica assignment
> {"version":1,
>  "partitions":[   {"topic":"testingTopic","partition":27,"replicas":[0,2]},
>                        
> {"topic":"testingTopic","partition":1,"replicas":[1,2]},
>                   {"topic":"testingTopic","partition":12,"replicas":[0,1]},
>                   {"topic":"testingTopic","partition":6,"replicas":[0,1]},
>                   {"topic":"testingTopic","partition":16,"replicas":[1,0]},
>                   {"topic":"testingTopic","partition":32,"replicas":[2,0]},
>                   {"topic":"testingTopic","partition":18,"replicas":[0,1]},
>                   {"topic":"testingTopic","partition":31,"replicas":[1,2]},
>                   {"topic":"testingTopic","partition":9,"replicas":[0,2]},
>                   {"topic":"testingTopic","partition":23,"replicas":[2,1]},
>                   {"topic":"testingTopic","partition":19,"replicas":[1,2]},
>                   {"topic":"testingTopic","partition":34,"replicas":[1,0]},
>                   {"topic":"testingTopic","partition":17,"replicas":[2,1]},
>                   {"topic":"testingTopic","partition":7,"replicas":[1,2]},
>                   {"topic":"testingTopic","partition":20,"replicas":[2,0]},
>                   {"topic":"testingTopic","partition":8,"replicas":[2,0]},
>                   {"topic":"testingTopic","partition":11,"replicas":[2,1]},
>                   {"topic":"testingTopic","partition":3,"replicas":[0,2]},
>                   {"topic":"testingTopic","partition":30,"replicas":[0,1]},
>                   {"topic":"testingTopic","partition":35,"replicas":[2,1]},
>                   {"topic":"testingTopic","partition":26,"replicas":[2,0]},
>                   {"topic":"testingTopic","partition":22,"replicas":[1,0]},
>                   {"topic":"testingTopic","partition":10,"replicas":[1,0]},
>                   {"topic":"testingTopic","partition":24,"replicas":[0,1]},
>                   {"topic":"testingTopic","partition":21,"replicas":[0,2]},
>                   {"topic":"testingTopic","partition":15,"replicas":[0,2]},
>                   {"topic":"testingTopic","partition":4,"replicas":[1,0]},
>                   {"topic":"testingTopic","partition":28,"replicas":[1,0]},
>                   {"topic":"testingTopic","partition":25,"replicas":[1,2]},:
>                   {"topic":"testingTopic","partition":14,"replicas":[2,0]},
>                   {"topic":"testingTopic","partition":2,"replicas":[2,0]},
>                   {"topic":"testingTopic","partition":13,"replicas":[1,2]},
>                   {"topic":"testingTopic","partition":5,"replicas":[2,1]},
>                   {"topic":"testingTopic","partition":29,"replicas":[2,1]},
>                   {"topic":"testingTopic","partition":33,"replicas":[0,2]},
>                   {"topic":"testingTopic","partition":0,"replicas":[0,1]}]}
>  Proposed partition reassignment configuration  ( 
> alex-expand-cluster-reassignment.json )
> {"version":1,
>  "partitions":[                                               
> {"topic":"testingTopic","partition":27,"replicas":[0,4]},
>                {"topic":"testingTopic","partition":1,"replicas":[4,2]},
>                                                                    
> {"topic":"testingTopic","partition":12,"replicas":[0,1]},
>                                                                    
> {"topic":"testingTopic","partition":6,"replicas":[4,3]},
>                                                                    
> {"topic":"testingTopic","partition":16,"replicas":[4,1]},
>                                                                    
> {"topic":"testingTopic","partition":32,"replicas":[0,1]},
>                                                                    
> {"topic":"testingTopic","partition":18,"replicas":[1,3]},
>                                                                    
> {"topic":"testingTopic","partition":31,"replicas":[4,0]},
>                                                                    
> {"topic":"testingTopic","partition":23,"replicas":[1,4]},
>                {"topic":"testingTopic","partition":9,"replicas":[2,1]},
>                {"topic":"testingTopic","partition":19,"replicas":[2,4]},
>                {"topic":"testingTopic","partition":34,"replicas":[2,3]},
>                {"topic":"testingTopic","partition":17,"replicas":[0,2]},
>                                                                    
> {"topic":"testingTopic","partition":20,"replicas":[3,1]},
>                                                                    
> {"topic":"testingTopic","partition":7,"replicas":[0,4]},
>                                                                    
> {"topic":"testingTopic","partition":8,"replicas":[1,0]},
>                                                                    
> {"topic":"testingTopic","partition":11,"replicas":[4,0]},
>                                                                    
> {"topic":"testingTopic","partition":3,"replicas":[1,4]},
>                                                                    
> {"topic":"testingTopic","partition":35,"replicas":[3,0]},
>                                                                    
> {"topic":"testingTopic","partition":30,"replicas":[3,4]},
>                                                                    
> {"topic":"testingTopic","partition":26,"replicas":[4,3]},
>                                                                    
> {"topic":"testingTopic","partition":22,"replicas":[0,3]},
>                                                                    
> {"topic":"testingTopic","partition":10,"replicas":[3,4]},
>                {"topic":"testingTopic","partition":24,"replicas":[2,0]},
>                {"topic":"testingTopic","partition":21,"replicas":[4,2]},
>                                                                    
> {"topic":"testingTopic","partition":15,"replicas":[3,0]},
>                {"topic":"testingTopic","partition":4,"replicas":[2,0]},
>                {"topic":"testingTopic","partition":25,"replicas":[3,2]},
>                                                                    
> {"topic":"testingTopic","partition":28,"replicas":[1,0]},
>                {"topic":"testingTopic","partition":14,"replicas":[2,3]},
>                                                                    
> {"topic":"testingTopic","partition":2,"replicas":[0,3]},
>                                                                    
> {"topic":"testingTopic","partition":13,"replicas":[1,2]},
>                {"topic":"testingTopic","partition":5,"replicas":[3,2]},
>                                                                    
> {"topic":"testingTopic","partition":29,"replicas":[2,1]},
>                {"topic":"testingTopic","partition":33,"replicas":[1,2]},
>                                                                    
> {"topic":"testingTopic","partition":0,"replicas":[3,1]}]}
> 3.  Start the reassignment
> $./kafka-reassign-partitions.sh  --execute  --broker-list 0,1,2,3,4 
> --reassignment-json-file ./alex-expand-cluster-reassignment.json  --zookeeper 
> 192.168.112.85:2181,192.168.112.86:2181,192.168.112.87:2181,192.168.112.88:2181,192.168.112.89:2181
> Current partition replica assignment
> {"version":1,"partitions":[{"topic":"testingTopic","partition":27,"replicas":[0,2]},{"topic":"testingTopic","partition":1,"replicas":[1,2]},{"topic":"testingTopic","partition":12,"replicas":[0,1]},{"topic":"testingTopic","partition":6,"replicas":[0,1]},{"topic":"testingTopic","partition":16,"replicas":[1,0]},{"topic":"testingTopic","partition":32,"replicas":[2,0]},{"topic":"testingTopic","partition":18,"replicas":[0,1]},{"topic":"testingTopic","partition":31,"replicas":[1,2]},{"topic":"testingTopic","partition":9,"replicas":[0,2]},{"topic":"testingTopic","partition":23,"replicas":[2,1]},{"topic":"testingTopic","partition":19,"replicas":[1,2]},{"topic":"testingTopic","partition":34,"replicas":[1,0]},{"topic":"testingTopic","partition":17,"replicas":[2,1]},{"topic":"testingTopic","partition":7,"replicas":[1,2]},{"topic":"testingTopic","partition":20,"replicas":[2,0]},{"topic":"testingTopic","partition":8,"replicas":[2,0]},{"topic":"testingTopic","partition":11,"replicas":[2,1]},{"topic":"testingTopic","partition":3,"replicas":[0,2]},{"topic":"testingTopic","partition":30,"replicas":[0,1]},{"topic":"testingTopic","partition":35,"replicas":[2,1]},{"topic":"testingTopic","partition":26,"replicas":[2,0]},{"topic":"testingTopic","partition":22,"replicas":[1,0]},{"topic":"testingTopic","partition":10,"replicas":[1,0]},{"topic":"testingTopic","partition":24,"replicas":[0,1]},{"topic":"testingTopic","partition":21,"replicas":[0,2]},{"topic":"testingTopic","partition":15,"replicas":[0,2]},{"topic":"testingTopic","partition":4,"replicas":[1,0]},{"topic":"testingTopic","partition":28,"replicas":[1,0]},{"topic":"testingTopic","partition":25,"replicas":[1,2]},{"topic":"testingTopic","partition":14,"replicas":[2,0]},{"topic":"testingTopic","partition":2,"replicas":[2,0]},{"topic":"testingTopic","partition":13,"replicas":[1,2]},{"topic":"testingTopic","partition":5,"replicas":[2,1]},{"topic":"testingTopic","partition":29,"replicas":[2,1]},{"topic":"testingTopic","partition":33,"replicas":[0,2]},{"topic":"testingTopic","partition":0,"replicas":[0,1]}]}
> Save this to use as the --reassignment-json-file option during rollback
> Successfully started reassignment of partitions 
> {"version":1,"partitions":[{"topic":"testingTopic","partition":27,"replicas":[0,4]},{"topic":"testingTopic","partition":1,"replicas":[4,2]},{"topic":"testingTopic","partition":12,"replicas":[0,1]},{"topic":"testingTopic","partition":6,"replicas":[4,3]},{"topic":"testingTopic","partition":16,"replicas":[4,1]},{"topic":"testingTopic","partition":32,"replicas":[0,1]},{"topic":"testingTopic","partition":31,"replicas":[4,0]},{"topic":"testingTopic","partition":18,"replicas":[1,3]},{"topic":"testingTopic","partition":9,"replicas":[2,1]},{"topic":"testingTopic","partition":23,"replicas":[1,4]},{"topic":"testingTopic","partition":19,"replicas":[2,4]},{"topic":"testingTopic","partition":17,"replicas":[0,2]},{"topic":"testingTopic","partition":34,"replicas":[2,3]},{"topic":"testingTopic","partition":20,"replicas":[3,1]},{"topic":"testingTopic","partition":7,"replicas":[0,4]},{"topic":"testingTopic","partition":8,"replicas":[1,0]},{"topic":"testingTopic","partition":11,"replicas":[4,0]},{"topic":"testingTopic","partition":3,"replicas":[1,4]},{"topic":"testingTopic","partition":35,"replicas":[3,0]},{"topic":"testingTopic","partition":30,"replicas":[3,4]},{"topic":"testingTopic","partition":26,"replicas":[4,3]},{"topic":"testingTopic","partition":22,"replicas":[0,3]},{"topic":"testingTopic","partition":10,"replicas":[3,4]},{"topic":"testingTopic","partition":24,"replicas":[2,0]},{"topic":"testingTopic","partition":21,"replicas":[4,2]},{"topic":"testingTopic","partition":15,"replicas":[3,0]},{"topic":"testingTopic","partition":4,"replicas":[2,0]},{"topic":"testingTopic","partition":28,"replicas":[1,0]},{"topic":"testingTopic","partition":25,"replicas":[3,2]},{"topic":"testingTopic","partition":14,"replicas":[2,3]},{"topic":"testingTopic","partition":2,"replicas":[0,3]},{"topic":"testingTopic","partition":13,"replicas":[1,2]},{"topic":"testingTopic","partition":5,"replicas":[3,2]},{"topic":"testingTopic","partition":29,"replicas":[2,1]},{"topic":"testingTopic","partition":33,"replicas":[1,2]},{"topic":"testingTopic","partition":0,"replicas":[3,1]}]}
> 4.  The result of my Topic reassignment (More than 4 days so far)
> $./kafka-reassign-partitions.sh  --verify  -reassignment-json-file 
> ./alex-expand-cluster-reassignment.json  --zookeeper 
> 192.168.112.85:2181,192.168.112.86:2181,192.168.112.87:2181,192.168.112.88:2181,192.168.112.89:2181
>   
> Status of partition reassignment:
> Status of partition reassignment:
> ERROR: Assigned replicas (4,2,1) don't match the list of replicas for 
> reassignment (4,2) for partition [testingTopic,1]
> ERROR: Assigned replicas (2,1,0) don't match the list of replicas for 
> reassignment (2,1) for partition [testingTopic,9]
> ERROR: Assigned replicas (2,4,1) don't match the list of replicas for 
> reassignment (2,4) for partition [testingTopic,19]
> ERROR: Assigned replicas (0,2,1) don't match the list of replicas for 
> reassignment (0,2) for partition [testingTopic,17]
> ERROR: Assigned replicas (2,3,1,0) don't match the list of replicas for 
> reassignment (2,3) for partition [testingTopic,34]
> ERROR: Assigned replicas (2,0,1) don't match the list of replicas for 
> reassignment (2,0) for partition [testingTopic,24]
> ERROR: Assigned replicas (4,2,0) don't match the list of replicas for 
> reassignment (4,2) for partition [testingTopic,21]
> ERROR: Assigned replicas (2,0,1) don't match the list of replicas for 
> reassignment (2,0) for partition [testingTopic,4]
> ERROR: Assigned replicas (3,2,1) don't match the list of replicas for 
> reassignment (3,2) for partition [testingTopic,25]
> ERROR: Assigned replicas (2,3,0) don't match the list of replicas for 
> reassignment (2,3) for partition [testingTopic,14]
> ERROR: Assigned replicas (3,2,1) don't match the list of replicas for 
> reassignment (3,2) for partition [testingTopic,5]
> ERROR: Assigned replicas (1,2,0) don't match the list of replicas for 
> reassignment (1,2) for partition [testingTopic,33]
> Reassignment of partition [testingTopic,10] completed successfully
> Reassignment of partition [testingTopic,27] completed successfully
> Reassignment of partition [testingTopic,13] completed successfully
> Reassignment of partition [testingTopic,34] failed
> Reassignment of partition [testingTopic,8] completed successfully
> Reassignment of partition [testingTopic,25] failed
> Reassignment of partition [testingTopic,35] completed successfully
> Reassignment of partition [testingTopic,31] completed successfully
> Reassignment of partition [testingTopic,18] completed successfully
> Reassignment of partition [testingTopic,19] failed
> Reassignment of partition [testingTopic,7] completed successfully
> Reassignment of partition [testingTopic,9] failed
> Reassignment of partition [testingTopic,0] completed successfully
> Reassignment of partition [testingTopic,3] completed successfully
> Reassignment of partition [testingTopic,2] completed successfully
> Reassignment of partition [testingTopic,26] completed successfully
> Reassignment of partition [testingTopic,30] completed successfully
> Reassignment of partition [testingTopic,11] completed successfully
> Reassignment of partition [testingTopic,4] failed
> Reassignment of partition [testingTopic,24] failed
> Reassignment of partition [testingTopic,32] completed successfully
> Reassignment of partition [testingTopic,15] completed successfully
> Reassignment of partition [testingTopic,6] completed successfully
> Reassignment of partition [testingTopic,28] completed successfully
> Reassignment of partition [testingTopic,17] failed
> Reassignment of partition [testingTopic,20] completed successfully
> Reassignment of partition [testingTopic,21] failed
> Reassignment of partition [testingTopic,16] completed successfully
> Reassignment of partition [testingTopic,22] completed successfully
> Reassignment of partition [testingTopic,23] completed successfully
> Reassignment of partition [testingTopic,1] failed
> Reassignment of partition [testingTopic,5] failed
> Reassignment of partition [testingTopic,12] completed successfully
> Reassignment of partition [testingTopic,33] failed
> Reassignment of partition [testingTopic,14] failed
> Reassignment of partition [testingTopic,29] completed successfully
> 5. Current Topic Status
> $./kafka-topics.sh --describe --topic testingTopic  --zookeeper 
> 192.168.112.95:2181,192.168.112.96:2181,192.168.112.97:2181,192.168.112.98:2181,192.168.112.99:2181
> Topic:halog   PartitionCount:36       ReplicationFactor:2     Configs:
>       Topic: halog    Partition: 0    Leader: 3       Replicas: 3,1   Isr: 3,1
>       Topic: halog    Partition: 1    Leader: 2       Replicas: 4,2,1 Isr: 
> 2,4,1                            <====
>       Topic: halog    Partition: 2    Leader: 0       Replicas: 0,3   Isr: 0,3
>       Topic: halog    Partition: 3    Leader: 4       Replicas: 1,4   Isr: 4,1
>       Topic: halog    Partition: 4    Leader: 2       Replicas: 2,0,1 Isr: 
> 2,0,1                            <====
>       Topic: halog    Partition: 5    Leader: 2       Replicas: 3,2,1 Isr: 
> 2,3,1                            <====
>       Topic: halog    Partition: 6    Leader: 4       Replicas: 4,3   Isr: 4,3
>       Topic: halog    Partition: 7    Leader: 0       Replicas: 0,4   Isr: 4,0
>       Topic: halog    Partition: 8    Leader: 0       Replicas: 1,0   Isr: 0,1
>       Topic: halog    Partition: 9    Leader: 0       Replicas: 2,1,0 Isr: 
> 0,2,1                            <====
>       Topic: halog    Partition: 10   Leader: 3       Replicas: 3,4   Isr: 4,3
>       Topic: halog    Partition: 11   Leader: 4       Replicas: 4,0   Isr: 4,0
>       Topic: halog    Partition: 12   Leader: 0       Replicas: 0,1   Isr: 0,1
>       Topic: halog    Partition: 13   Leader: 2       Replicas: 1,2   Isr: 2,1
>       Topic: halog    Partition: 14   Leader: 3       Replicas: 2,3,0 Isr: 
> 3,0,2                            <====
>       Topic: halog    Partition: 15   Leader: 3       Replicas: 3,0   Isr: 3,0
>       Topic: halog    Partition: 16   Leader: 4       Replicas: 4,1   Isr: 4,1
>       Topic: halog    Partition: 17   Leader: 2       Replicas: 0,2,1 Isr: 
> 2,0,1                            <====
>       Topic: halog    Partition: 18   Leader: 1       Replicas: 1,3   Isr: 3,1
>       Topic: halog    Partition: 19   Leader: 2       Replicas: 2,4,1 Isr: 
> 2,4,1                            <====
>       Topic: halog    Partition: 20   Leader: 3       Replicas: 3,1   Isr: 3,1
>       Topic: halog    Partition: 21   Leader: 4       Replicas: 4,2,0 Isr: 
> 4,0,2                            <====
>       Topic: halog    Partition: 22   Leader: 0       Replicas: 0,3   Isr: 0,3
>       Topic: halog    Partition: 23   Leader: 1       Replicas: 1,4   Isr: 4,1
>       Topic: halog    Partition: 24   Leader: 2       Replicas: 2,0,1 Isr: 
> 2,0,1                            <====
>       Topic: halog    Partition: 25   Leader: 2       Replicas: 3,2,1 Isr: 
> 2,3,1                            <====
>       Topic: halog    Partition: 26   Leader: 4       Replicas: 4,3   Isr: 4,3
>       Topic: halog    Partition: 27   Leader: 0       Replicas: 0,4   Isr: 0,4
>       Topic: halog    Partition: 28   Leader: 0       Replicas: 1,0   Isr: 0,1
>       Topic: halog    Partition: 29   Leader: 2       Replicas: 2,1   Isr: 2,1
>       Topic: halog    Partition: 30   Leader: 3       Replicas: 3,4   Isr: 4,3
>       Topic: halog    Partition: 31   Leader: 4       Replicas: 4,0   Isr: 4,0
>       Topic: halog    Partition: 32   Leader: 0       Replicas: 0,1   Isr: 0,1
>       Topic: halog    Partition: 33   Leader: 0       Replicas: 1,2,0 Isr: 
> 0,2,1                            <====
>       Topic: halog    Partition: 34   Leader: 2       Replicas: 2,3,1,0       
> Isr: 2,0,3,1                  <====
>       Topic: halog    Partition: 35   Leader: 3       Replicas: 3,0   Isr: 3,0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to