Akim Akimov created SAMZA-1739:
----------------------------------

             Summary: Some containers hangs on restore KV store
                 Key: SAMZA-1739
                 URL: https://issues.apache.org/jira/browse/SAMZA-1739
             Project: Samza
          Issue Type: Bug
          Components: kafka, kv-store
    Affects Versions: 0.14.0
            Reporter: Akim Akimov


There's a problem we could not reproduce in dev enviroment which affected prod 
enviroment.

 

Issue is that on restart of application 4 containers out of 12 hanging on 
restore from kv store changelog.

 

this is how it manifests:

 

{{2018-05-29 16:01:55,650 [main] INFO 
org.apache.samza.storage.TaskStorageManager - Assigning oldest change log 
offsets for taskName Partition 8: Map(SystemStream [system=kafka, 
stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
{{2018-05-29 16:01:55,653 [main] INFO 
org.apache.samza.storage.TaskStorageManager - Registering change log consumer 
with offset 0 for SystemStreamPartition [kafka, 
chainstream_one-1-window-window_cid_batch, 10].}}
{{2018-05-29 16:01:55,654 [main] INFO 
org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for: 
Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
{{2018-05-29 16:01:55,655 [main] INFO org.apache.samza.system.kafka.BrokerProxy 
- Creating new SimpleConsumer for host ip-x.us-east-1.code418.net:9092 for 
system kafka}}
{{2018-05-29 16:01:55,656 [main] INFO org.apache.samza.system.kafka.GetOffset - 
Validating offset 0 for topic and partition 
[chainstream_one-1-window-window_cid_batch,10]}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.GetOffset - 
Able to successfully read from offset 0 for topic and partition 
[chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate 
consumer.}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.BrokerProxy 
- Starting BrokerProxy for ip-x.us-east-1.code418.net:9092}}
{{2018-05-29 16:01:58,129 [main] INFO 
org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries 
restored...}}
{{2018-05-29 16:01:59,707 [main] INFO 
org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries 
restored...}}
{{2018-05-29 16:02:01,318 [main] INFO 
org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries 
restored...}}
{{2018-05-29 16:02:02,920 [main] INFO 
org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries 
restored...}}{{End of LogType:stdout. This log file belongs to a running 
container (}}

 Other containers starts as normal:

 

{{2018-05-29 16:02:18,564 [main] INFO 
org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries 
restored...}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy 
- Shutting down BrokerProxy for ip-x.net:9092}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy 
- closing simple consumer...}}

 

System:

Samza 0.14

kafka.x86_64           0.11.0.1-1                    
|YARN|2.7.3|

|ZooKeeper|3.4.6|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to