[ 
https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081386#comment-16081386
 ] 

Tzu-Li (Gordon) Tai edited comment on FLINK-7143 at 7/11/17 12:03 AM:
----------------------------------------------------------------------

IIRC, sorting the fetched partition list was removed in favor of using the 
hashCode of {{KafkaTopicPartition}} s for the mod operation. This must have 
been a remnant from that change ...

Apparently, the current partition assignment tests do not have enough coverage. 
We also need a test that verifies assignment stability in the case of different 
fetched partitions ordering.


was (Author: tzulitai):
IIRC, sorting the fetched partition list was removed in favor of using the 
hashCode of {{KafkaTopicPartition} }s for the mod operation. This must have 
been a remnant from that change ...

Apparently, the current partition assignment tests do not have enough coverage. 
We also need a test that verifies assignment stability in the case of different 
fetched partitions ordering.

> Partition assignment for Kafka consumer is not stable
> -----------------------------------------------------
>
>                 Key: FLINK-7143
>                 URL: https://issues.apache.org/jira/browse/FLINK-7143
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector
>    Affects Versions: 1.3.1
>            Reporter: Steven Zhen Wu
>            Priority: Blocker
>             Fix For: 1.3.2
>
>
> while deploying Flink 1.3 release to hundreds of routing jobs, we found some 
> issues with partition assignment for Kafka consumer. some partitions weren't 
> assigned and some partitions got assigned more than once.
> Here is the bug introduced in Flink 1.3. 
> {code}
>       protected static void initializeSubscribedPartitionsToStartOffsets(...) 
> {
>                 ...
>               for (int i = 0; i < kafkaTopicPartitions.size(); i++) {
>                       if (i % numParallelSubtasks == indexOfThisSubtask) {
>                               if (startupMode != 
> StartupMode.SPECIFIC_OFFSETS) {
>                                       
> subscribedPartitionsToStartOffsets.put(kafkaTopicPartitions.get(i), 
> startupMode.getStateSentinel());
>                               }
>                 ...
>          }
> {code}
> The bug is using array index {{i}} to mode against {{numParallelSubtasks}}. 
> if the {{kafkaTopicPartitions}} has different order among different subtasks, 
> assignment is not stable cross subtasks and creates the assignment issue 
> mentioned earlier. 
> fix is also very simple, we should use partition id to do the mod {{if 
> (kafkaTopicPartitions.get\(i\).getPartition() % numParallelSubtasks == 
> indexOfThisSubtask)}}. That would result in stable assignment cross subtasks 
> that independent of ordering in the array.
> marking it as blocker because of its impact.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to