[jira] [Commented] (KAFKA-7209) Kafka stream does not rebalance when one node gets down

Yogesh BG (JIRA) Thu, 26 Jul 2018 13:50:20 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558874#comment-16558874
 ]


Yogesh BG commented on KAFKA-7209:
----------------------------------

Here are observation

3 broker and 3 stream app - initially working fine

kill one app, then gets rebalanced and start streaming without loss in data

i could see below logs

{{20:15:26.627 [ks_0_inst_CSV_LOG-StreamThread-22] INFO  
o.a.k.c.c.i.ConsumerCoordinator - Setting newly assigned partitions [PR-35, 
PR_vThunder-27, PR-27, PR_vThunder-35] for group 
aggregation-framework03_CSV_LOG}}{{20:15:26.627 
[ks_0_inst_CSV_LOG-StreamThread-20] INFO  o.a.k.s.p.internals.StreamThread - 
stream-thread [ks_0_inst_CSV_LOG-StreamThread-20] State transition from 
PARTITIONS_REVOKED to PARTITIONS_ASSIGNED.}}{{20:16:32.174 
[ks_0_inst_THUNDER_LOG_L7-StreamThread-90] INFO  
o.a.k.c.c.i.ConsumerCoordinator - Setting newly assigned partitions 
[THUNDER_LOG_L7_PR-15, THUNDER_LOG_L7_PE-15] for group 
aggregation-framework03_THUNDER_LOG_L7}}{{20:16:32.175 
[ks_0_inst_THUNDER_LOG_L7-StreamThread-86] INFO  
o.a.k.c.c.i.ConsumerCoordinator - Setting newly assigned partitions 
[THUNDER_LOG_L7_PR-9, THUNDER_LOG_L7_PE-9] for group 
aggregation-framework03_THUNDER_LOG_L7}}{{20:16:32.175 
[ks_0_inst_THUNDER_LOG_L7-StreamThread-85] INFO  
o.a.k.c.c.i.ConsumerCoordinator - Setting newly assigned partitions 
[THUNDER_LOG_L7_PE-35, THUNDER_LOG_L7_PR-27, THUNDER_LOG_L7_PE-27, 
THUNDER_LOG_L7_PR-35] for group aggregation-framework03_THUNDER_LOG_L7}}

 

But the thing i dont get is when i look into sate dir i dont see the partition 
folders get created for newly assigned partitions

below is the initial state before i kill first one[rtp-worker-2] and for other 
two it remains same and does not changes at all

 

{{[root@rtp-worker-2 /]# ls 
/tmp/data/kstreams/aggregation-framework_THUNDER_LOG_L7/}}{{*0_0*  *0_10*  
*0_12*  *0_14*  *0_16*  *0_18*  *0_2*   *0_21*  *0_23*  *0_25*  *0_27*  *0_29*  
*0_30*  *0_32*  *0_34*  *0_4*  *0_6*  *0_8*}}{{*0_1*  *0_11*  *0_13*  *0_15*  
*0_17*  *0_19*  *0_20*  *0_22*  *0_24*  *0_26*  *0_28*  *0_3*   *0_31*  *0_33*  
*0_35*  *0_5*  *0_7*  *0_9*}}{{[root@rtp-worker-0 /]# ls 
/tmp/data/kstreams/aggregation-framework_THUNDER_LOG_L7/}}{{*0_0*  *0_1*  
*0_10*  *0_11*  *0_2*  *0_3*  *0_4*  *0_5*  *0_6*  *0_7*  *0_8*  
*0_9*}}{{[root@rtp-worker-1 /]# ls 
/tmp/data/kstreams/aggregation-framework_THUNDER_LOG_L7/}}{{*0_11*  *0_12*  
*0_13*  *0_14*  *0_15*  *0_16*  *0_17*  *0_18*  *0_19*  *0_20*  *0_21*  *0_22*  
*0_23*}}

 

Another case is that all 3 apps running successfully, i bring down one broker 
then broker gets rebalanced itself. Apps also gets rebalanced with broker and 
start streaming data, *but there is a data loss observed, when rebalancing in 
broker is happening. Is there a way to avoid this? does other two broker become 
non responsive when cluster is rebalancing???*

 

{color:#FF0000}*Next is when broker and stream goes down at the same time, then 
i could see broker gets rebalanced and i see some communication messages being 
received by apps but they never gets back to streaming, esp when multiple 
partitions are there, those topics which has one partitions gets to streaming 
in sometime.*{color}

> Kafka stream does not rebalance when one node gets down
> -------------------------------------------------------
>
>                 Key: KAFKA-7209
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7209
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.11.0.1
>            Reporter: Yogesh BG
>            Priority: Critical
>
> I use kafka streams 0.11.0.1session timeout 60000, retries to be int.max and 
> backoff time default
>  
> I have 3 nodes running kafka cluster of 3 broker
> and i am running the 3 kafka stream with same 
> [application.id|http://application.id/]
> each node has one broker one kafka stream application
> everything works fine during setup
> i bringdown one node, so one kafka broker and one streaming app is down
> now i see exceptions in other two streaming apps and it never gets re 
> balanced waited for hours and never comes back to norma
> is there anything am missing?
> i also tried looking into when one broker is down call stream.close, cleanup 
> and restart this also doesn't help
> can anyone help me?
>  
>  
>  
>  One thing i observed lately is that kafka topics with partitions one gets 
> reassigned but i have topics of 16 partitions and replication factor 3. It 
> never settles up



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-7209) Kafka stream does not rebalance when one node gets down

Reply via email to