[
https://issues.apache.org/jira/browse/STORM-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuzhao Chen updated STORM-2901:
-------------------------------
Description:
Now when our nimbus restarts, many zookeeper connections will be made in
minutes, and it's really a pressure for our zookeeper server.
I checkout the log and code to find that when nimbus restart, in order to sync
local storm keys[ actually valid storms ], it will:
# check storm keys diff of local storm and zk remote.
# set up path for all the valid storm keys with a keySequenceNumber.
# In order to get the keySequenceNumber, now blobstore will make a new
zk-client and connect to zk-server.
This is the reason why thousands of connections are made. For our cluster,
there are about 800+ topologies running, which means that at least 800
connections will be made which totally can be reused.
This is part of nimbus re-starting log:
2018-01-18 12:51:57.031 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client
connection,
connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01
sessionTimeout=30000
watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@76513a57
2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket
connection to server dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181. Will
not attempt to authenticate using SASL (unknown error)
2018-01-18 12:51:57.033 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection
established to dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, initiating
session
2018-01-18 12:51:57.034 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment
complete on server dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, sessionid
= 0x45cd92f0cc7e938, negotiated timeout = 30000
2018-01-18 12:51:57.034 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State
change: CONNECTED
2018-01-18 12:51:57.037 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO]
backgroundOperationsLoop exiting
2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
0x45cd92f0cc7e938 closed
2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down
2018-01-18 12:51:57.040 o.a.s.cluster [INFO]
setup-path/blobstore/app_waimairank_wm_recsys_user_block-4-1504509174-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1
2018-01-18 12:51:57.051 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2018-01-18 12:51:57.051 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client
connection,
connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01
sessionTimeout=30000
watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@69c222d6
2018-01-18 12:51:57.052 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket
connection to server dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181. Will not
attempt to authenticate using SASL (unknown error)
2018-01-18 12:51:57.053 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection
established to dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, initiating
session
2018-01-18 12:51:57.055 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment
complete on server dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, sessionid
= 0x25cd386f245eb72, negotiated timeout = 30000
2018-01-18 12:51:57.055 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State
change: CONNECTED
2018-01-18 12:51:57.058 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO]
backgroundOperationsLoop exiting
2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
0x25cd386f245eb72 closed
2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down
2018-01-18 12:51:57.061 o.a.s.cluster [INFO]
setup-path/blobstore/app_waimairank_waimai_rank_rt_pipeline_user_feature-12-1507516853-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1
was:
Now then our nimbus restart, many zookeeper client will be made in minutes, and
it's really a pressure for our zookeeper.
I checkout the log and code to find that when nimbus restart, in order to sync
local storm keys[ actually valid storms ], it will:
# check storm keys diff of local storm and zk remote.
# set up path for all the valid storm keys with a keySequenceNumber.
# In order to get the number, now blobstore will make a new zk-client and
connect to zk-server.
This is the reason why thousands of connections are made. For our cluster,
there are about 800+ topologies running, which means that at least 800
connections will be made which totally can be reused.
This is part of nimbus re-starting log:
2018-01-18 12:51:57.031 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client
connection,
connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01
sessionTimeout=30000
watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@76513a57
2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket
connection to server dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181. Will
not attempt to authenticate using SASL (unknown error)
2018-01-18 12:51:57.033 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection
established to dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, initiating
session
2018-01-18 12:51:57.034 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment
complete on server dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, sessionid
= 0x45cd92f0cc7e938, negotiated timeout = 30000
2018-01-18 12:51:57.034 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State
change: CONNECTED
2018-01-18 12:51:57.037 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO]
backgroundOperationsLoop exiting
2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
0x45cd92f0cc7e938 closed
2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down
2018-01-18 12:51:57.040 o.a.s.cluster [INFO]
setup-path/blobstore/app_waimairank_wm_recsys_user_block-4-1504509174-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1
2018-01-18 12:51:57.051 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2018-01-18 12:51:57.051 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client
connection,
connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01
sessionTimeout=30000
watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@69c222d6
2018-01-18 12:51:57.052 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket
connection to server dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181. Will not
attempt to authenticate using SASL (unknown error)
2018-01-18 12:51:57.053 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection
established to dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, initiating
session
2018-01-18 12:51:57.055 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment
complete on server dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, sessionid
= 0x25cd386f245eb72, negotiated timeout = 30000
2018-01-18 12:51:57.055 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State
change: CONNECTED
2018-01-18 12:51:57.058 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO]
backgroundOperationsLoop exiting
2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
0x25cd386f245eb72 closed
2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down
2018-01-18 12:51:57.061 o.a.s.cluster [INFO]
setup-path/blobstore/app_waimairank_waimai_rank_rt_pipeline_user_feature-12-1507516853-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1
> Reuse ZK connection for getKeySequenceNumber
> --------------------------------------------
>
> Key: STORM-2901
> URL: https://issues.apache.org/jira/browse/STORM-2901
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-server
> Affects Versions: 2.0.0, 1.2.0
> Reporter: Yuzhao Chen
> Assignee: Yuzhao Chen
> Priority: Major
> Labels: patch, pull-request-available
> Fix For: 2.0.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Now when our nimbus restarts, many zookeeper connections will be made in
> minutes, and it's really a pressure for our zookeeper server.
> I checkout the log and code to find that when nimbus restart, in order to
> sync local storm keys[ actually valid storms ], it will:
> # check storm keys diff of local storm and zk remote.
> # set up path for all the valid storm keys with a keySequenceNumber.
> # In order to get the keySequenceNumber, now blobstore will make a new
> zk-client and connect to zk-server.
> This is the reason why thousands of connections are made. For our cluster,
> there are about 800+ topologies running, which means that at least 800
> connections will be made which totally can be reused.
>
> This is part of nimbus re-starting log:
> 2018-01-18 12:51:57.031 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client
> connection,
> connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01
> sessionTimeout=30000
> watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@76513a57
> 2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket
> connection to server dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181. Will
> not attempt to authenticate using SASL (unknown error)
> 2018-01-18 12:51:57.033 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection
> established to dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, initiating
> session
> 2018-01-18 12:51:57.034 o.a.s.s.o.a.z.ClientCnxn [INFO] Session
> establishment complete on server
> dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, sessionid =
> 0x45cd92f0cc7e938, negotiated timeout = 30000
> 2018-01-18 12:51:57.034 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO]
> State change: CONNECTED
> 2018-01-18 12:51:57.037 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO]
> backgroundOperationsLoop exiting
> 2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
> 0x45cd92f0cc7e938 closed
> 2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down
> 2018-01-18 12:51:57.040 o.a.s.cluster [INFO]
> setup-path/blobstore/app_waimairank_wm_recsys_user_block-4-1504509174-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1
> 2018-01-18 12:51:57.051 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO]
> Starting
> 2018-01-18 12:51:57.051 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client
> connection,
> connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01
> sessionTimeout=30000
> watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@69c222d6
> 2018-01-18 12:51:57.052 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket
> connection to server dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181. Will
> not attempt to authenticate using SASL (unknown error)
> 2018-01-18 12:51:57.053 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection
> established to dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, initiating
> session
> 2018-01-18 12:51:57.055 o.a.s.s.o.a.z.ClientCnxn [INFO] Session
> establishment complete on server
> dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, sessionid =
> 0x25cd386f245eb72, negotiated timeout = 30000
> 2018-01-18 12:51:57.055 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO]
> State change: CONNECTED
> 2018-01-18 12:51:57.058 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO]
> backgroundOperationsLoop exiting
> 2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
> 0x25cd386f245eb72 closed
> 2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down
> 2018-01-18 12:51:57.061 o.a.s.cluster [INFO]
> setup-path/blobstore/app_waimairank_waimai_rank_rt_pipeline_user_feature-12-1507516853-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)