Jonathan Hurley created AMBARI-14819:
----------------------------------------

             Summary: RU : Storm Topologies stopped running while rolling 
upgrade
                 Key: AMBARI-14819
                 URL: https://issues.apache.org/jira/browse/AMBARI-14819
             Project: Ambari
          Issue Type: Bug
    Affects Versions: 2.2.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Blocker
             Fix For: 2.2.2


When performing a rolling upgrade from HDP 2.3 to 2.4, Storm topologies are 
stopped.

1) Start HDFS topology and Hive topology before Rolling upgrade starts
{code:title=HDFS topology}
2016-01-22 
16:35:40,011|beaver.component.rollingupgrade.ruCommon|INFO|28499|139893976106752|MainThread|Running
 long running background jobs for storm.
2016-01-22 
16:35:40,015|beaver.machine|INFO|28499|139893976106752|MainThread|RUNNING: 
/usr/hdp/current/storm-client/bin/storm -c 
java.security.auth.login.config=/etc/storm/conf/client_jaas.conf -c 
storm.thrift.transport=backtype.storm.security.auth.kerberos.KerberosSaslTransportPlugin
 jar 
/grid/0/hadoopqe/artifacts/storm-hdfs-tests/target/storm-integration-test-1.0-SNAPSHOT.jar
 org.apache.storm.hdfs.bolt.HdfsFileTopology hdfs://nameservice/tmp 
/tmp/hdfs-conf.yaml HDFSTopology
{code}
{code:title=Hive Topology}
2016-01-22 
16:37:24,486|beaver.machine|INFO|28499|139893976106752|MainThread|RUNNING: 
/usr/hdp/current/storm-client/bin/storm -c 
java.security.auth.login.config=/etc/storm/conf/client_jaas.conf -c 
storm.thrift.transport=backtype.storm.security.auth.kerberos.KerberosSaslTransportPlugin
 jar 
/grid/0/hadoopqe/artifacts/storm-hive-tests/target/storm-integration-test-1.0-SNAPSHOT.jar
 org.apache.storm.hive.bolt.HiveTopologyPartitioned 
thrift://os-d7-gkzzqs-rudalm10todalnextsecha-1.novalocal:9083,thrift://os-d7-gkzzqs-rudalm10todalnextsecha-10.novalocal:9083,thrift://os-d7-gkzzqs-rudalm10todalnextsecha-10.novalocal:9083,thrift://os-d7-gkzzqs-rudalm10todalnextsecha-11.novalocal:9083
 stormdb userdata HiveTopology 
/home/hrt_qa/hadoopqa/keytabs/hrt_qa.headless.keytab [email protected]
{code}
2) Make sure it runs through out the Rolling upgrade.
3) Validate if it was running fine. 

Here, While upgrading from 2.3.2.0-2950 to  2.4.0.0-128, All storm topologies 
stopped. 
I see below stack trace for HDFS Topology worker node.
http://qelog.hortonworks.com/log/os-d7-gkzzqs-rudalm10todalnextsecha/service-logs/storm/172.22.103.85/HDFSTopology-1-1453480559-worker-6701.log
{code}
2016-01-22 19:41:04.084 b.s.d.executor [ERROR] 
java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.storm.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode 
= NodeExists for /errors/HDFSTopology-1-1453480559/my-bolt-last-error
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.daemon.executor$fn__6099$fn__6112$fn__6163.invoke(executor.clj:808)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at backtype.storm.util$async_loop$fn__543.invoke(util.clj:475) 
[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
        at java.lang.Thread.run(Thread.java:745) [?:1.7.0_67]
Caused by: java.lang.RuntimeException: 
org.apache.storm.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode 
= NodeExists for /errors/HDFSTopology-1-1453480559/my-bolt-last-error
        at backtype.storm.util$wrap_in_runtime.invoke(util.clj:48) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at backtype.storm.zookeeper$create_node.invoke(zookeeper.clj:97) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.cluster$mk_distributed_cluster_state$reify__4937.set_data(cluster.clj:110)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.cluster$mk_storm_cluster_state$reify__5557.report_error(cluster.clj:537)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.daemon.executor$throttled_report_error_fn$fn__5878.invoke(executor.clj:193)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.daemon.executor$fn__6099$fn$reify__6147.reportError(executor.clj:798)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.task.OutputCollector.reportError(OutputCollector.java:223) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at org.apache.storm.hdfs.bolt.HdfsBolt.execute(HdfsBolt.java:115) 
~[stormjar.jar:?]
        at 
backtype.storm.daemon.executor$fn__6099$tuple_action_fn__6101.invoke(executor.clj:670)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.daemon.executor$mk_task_receiver$fn__6022.invoke(executor.clj:426)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.disruptor$clojure_handler$reify__912.onEvent(disruptor.clj:58) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        ... 6 more
Caused by: org.apache.storm.zookeeper.KeeperException$NodeExistsException: 
KeeperErrorCode = NodeExists for 
/errors/HDFSTopology-1-1453480559/my-bolt-last-error
        at 
org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:119) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:51) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at org.apache.storm.zookeeper.ZooKeeper.create(ZooKeeper.java:783) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
org.apache.storm.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:676)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
org.apache.storm.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:660)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
org.apache.storm.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:656)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
org.apache.storm.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:441)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
org.apache.storm.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:431)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
org.apache.storm.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:239)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
org.apache.storm.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:193)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.7.0_67]
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
~[?:1.7.0_67]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.7.0_67]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_67]
        at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
~[clojure-1.6.0.jar:?]
        at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) 
~[clojure-1.6.0.jar:?]
        at backtype.storm.zookeeper$create_node.invoke(zookeeper.clj:96) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.cluster$mk_distributed_cluster_state$reify__4937.set_data(cluster.clj:110)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.cluster$mk_storm_cluster_state$reify__5557.report_error(cluster.clj:537)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.daemon.executor$throttled_report_error_fn$fn__5878.invoke(executor.clj:193)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.daemon.executor$fn__6099$fn$reify__6147.reportError(executor.clj:798)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.task.OutputCollector.reportError(OutputCollector.java:223) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at org.apache.storm.hdfs.bolt.HdfsBolt.execute(HdfsBolt.java:115) 
~[stormjar.jar:?]
        at 
backtype.storm.daemon.executor$fn__6099$tuple_action_fn__6101.invoke(executor.clj:670)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.daemon.executor$mk_task_receiver$fn__6022.invoke(executor.clj:426)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.disruptor$clojure_handler$reify__912.onEvent(disruptor.clj:58) 
~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.10.0.2.3.2.0-2950.jar:0.10.0.2.3.2.0-2950]
        ... 6 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to