[
https://issues.apache.org/jira/browse/SAMZA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Akim Akimov updated SAMZA-1739:
-------------------------------
Description:
There's a problem we could not reproduce in dev enviroment which affected prod
enviroment.
Issue is that on restart of application 4 containers out of 12 hanging on
restore from kv store changelog.
Application configuration:
12 containers deployed with yarn. kv store in question - window aggregation KV
this is how it manifests:
{{2018-05-29 16:01:55,650 [main] INFO
org.apache.samza.storage.TaskStorageManager - Assigning oldest change log
offsets for taskName Partition 8: Map(SystemStream [system=kafka,
stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
{{2018-05-29 16:01:55,653 [main] INFO
org.apache.samza.storage.TaskStorageManager - Registering change log consumer
with offset 0 for SystemStreamPartition [kafka,
chainstream_one-1-window-window_cid_batch, 10].}}
{{2018-05-29 16:01:55,654 [main] INFO
org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for:
Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
{{2018-05-29 16:01:55,655 [main] INFO
org.apache.samza.system.kafka.BrokerProxy - Creating new SimpleConsumer for
host ip-x.us-east-1.code418.net:9092 for system kafka}}
{{2018-05-29 16:01:55,656 [main] INFO org.apache.samza.system.kafka.GetOffset
- Validating offset 0 for topic and partition
[chainstream_one-1-window-window_cid_batch,10]}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.GetOffset
- Able to successfully read from offset 0 for topic and partition
[chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate
consumer.}}
{{2018-05-29 16:01:55,693 [main] INFO
org.apache.samza.system.kafka.BrokerProxy - Starting BrokerProxy for
ip-x.us-east-1.code418.net:9092}}
{{2018-05-29 16:01:58,129 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries
restored...}}
{{2018-05-29 16:01:59,707 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries
restored...}}
{{2018-05-29 16:02:01,318 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries
restored...}}
{{2018-05-29 16:02:02,920 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries
restored...}}{{End of LogType:stdout. This log file belongs to a running
container (}}
Other containers starts as normal:
{{2018-05-29 16:02:18,564 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries
restored...}}
{{2018-05-29 16:02:19,700 [main] INFO
org.apache.samza.system.kafka.BrokerProxy - Shutting down BrokerProxy for
ip-x.net:9092}}
{{2018-05-29 16:02:19,700 [main] INFO
org.apache.samza.system.kafka.BrokerProxy - closing simple consumer...}}
System:
Samza 0.14
kafka.x86_64 0.11.0.1-1
|YARN|2.7.3|
|ZooKeeper|3.4.6|
was:
There's a problem we could not reproduce in dev enviroment which affected prod
enviroment.
Issue is that on restart of application 4 containers out of 12 hanging on
restore from kv store changelog.
this is how it manifests:
{{2018-05-29 16:01:55,650 [main] INFO
org.apache.samza.storage.TaskStorageManager - Assigning oldest change log
offsets for taskName Partition 8: Map(SystemStream [system=kafka,
stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
{{2018-05-29 16:01:55,653 [main] INFO
org.apache.samza.storage.TaskStorageManager - Registering change log consumer
with offset 0 for SystemStreamPartition [kafka,
chainstream_one-1-window-window_cid_batch, 10].}}
{{2018-05-29 16:01:55,654 [main] INFO
org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for:
Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
{{2018-05-29 16:01:55,655 [main] INFO org.apache.samza.system.kafka.BrokerProxy
- Creating new SimpleConsumer for host ip-x.us-east-1.code418.net:9092 for
system kafka}}
{{2018-05-29 16:01:55,656 [main] INFO org.apache.samza.system.kafka.GetOffset -
Validating offset 0 for topic and partition
[chainstream_one-1-window-window_cid_batch,10]}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.GetOffset -
Able to successfully read from offset 0 for topic and partition
[chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate
consumer.}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.BrokerProxy
- Starting BrokerProxy for ip-x.us-east-1.code418.net:9092}}
{{2018-05-29 16:01:58,129 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries
restored...}}
{{2018-05-29 16:01:59,707 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries
restored...}}
{{2018-05-29 16:02:01,318 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries
restored...}}
{{2018-05-29 16:02:02,920 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries
restored...}}{{End of LogType:stdout. This log file belongs to a running
container (}}
Other containers starts as normal:
{{2018-05-29 16:02:18,564 [main] INFO
org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries
restored...}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy
- Shutting down BrokerProxy for ip-x.net:9092}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy
- closing simple consumer...}}
System:
Samza 0.14
kafka.x86_64 0.11.0.1-1
|YARN|2.7.3|
|ZooKeeper|3.4.6|
> Some containers hangs on restore KV store
> -----------------------------------------
>
> Key: SAMZA-1739
> URL: https://issues.apache.org/jira/browse/SAMZA-1739
> Project: Samza
> Issue Type: Bug
> Components: kafka, kv-store
> Affects Versions: 0.14.0
> Reporter: Akim Akimov
> Priority: Major
>
> There's a problem we could not reproduce in dev enviroment which affected
> prod enviroment.
>
> Issue is that on restart of application 4 containers out of 12 hanging on
> restore from kv store changelog.
>
>
> Application configuration:
> 12 containers deployed with yarn. kv store in question - window aggregation
> KV
> this is how it manifests:
>
> {{2018-05-29 16:01:55,650 [main] INFO
> org.apache.samza.storage.TaskStorageManager - Assigning oldest change log
> offsets for taskName Partition 8: Map(SystemStream [system=kafka,
> stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
> {{2018-05-29 16:01:55,653 [main] INFO
> org.apache.samza.storage.TaskStorageManager - Registering change log consumer
> with offset 0 for SystemStreamPartition [kafka,
> chainstream_one-1-window-window_cid_batch, 10].}}
> {{2018-05-29 16:01:55,654 [main] INFO
> org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for:
> Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
> {{2018-05-29 16:01:55,655 [main] INFO
> org.apache.samza.system.kafka.BrokerProxy - Creating new SimpleConsumer for
> host ip-x.us-east-1.code418.net:9092 for system kafka}}
> {{2018-05-29 16:01:55,656 [main] INFO
> org.apache.samza.system.kafka.GetOffset - Validating offset 0 for topic and
> partition [chainstream_one-1-window-window_cid_batch,10]}}
> {{2018-05-29 16:01:55,693 [main] INFO
> org.apache.samza.system.kafka.GetOffset - Able to successfully read from
> offset 0 for topic and partition
> [chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate
> consumer.}}
> {{2018-05-29 16:01:55,693 [main] INFO
> org.apache.samza.system.kafka.BrokerProxy - Starting BrokerProxy for
> ip-x.us-east-1.code418.net:9092}}
> {{2018-05-29 16:01:58,129 [main] INFO
> org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries
> restored...}}
> {{2018-05-29 16:01:59,707 [main] INFO
> org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries
> restored...}}
> {{2018-05-29 16:02:01,318 [main] INFO
> org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries
> restored...}}
> {{2018-05-29 16:02:02,920 [main] INFO
> org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries
> restored...}}{{End of LogType:stdout. This log file belongs to a running
> container (}}
> Other containers starts as normal:
>
> {{2018-05-29 16:02:18,564 [main] INFO
> org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries
> restored...}}
> {{2018-05-29 16:02:19,700 [main] INFO
> org.apache.samza.system.kafka.BrokerProxy - Shutting down BrokerProxy for
> ip-x.net:9092}}
> {{2018-05-29 16:02:19,700 [main] INFO
> org.apache.samza.system.kafka.BrokerProxy - closing simple consumer...}}
>
> System:
> Samza 0.14
> kafka.x86_64 0.11.0.1-1
> |YARN|2.7.3|
> |ZooKeeper|3.4.6|
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)