[ 
https://issues.apache.org/jira/browse/STORM-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058959#comment-14058959
 ] 

ASF GitHub Bot commented on STORM-376:
--------------------------------------

Github user revans2 commented on the pull request:

    https://github.com/apache/incubator-storm/pull/168#issuecomment-48752031
  
    With a basic ZK setup, spinning disk, ext4, and forcesync enabled by 
default, we saw about 800 write operations per second even with ZK batching 
writes.  Heartbeats were the cause of most of the writes.
    
    heartbeatWritesPerSecond = supervisors/5 seconds + (supervisors * 
workersPerSupervisor)/3 seconds.
    
    So it scales linearly with the number of supervisors you have. For 10 
workers per supervisor that ends up at about 225 supervisors max, for a smooth 
running cluster it is actually closer to 200.
    
    If we turned the force sync off, and let the OS batch/reorder writes the 
new bottleneck was the outbound link of the gigabit ethernet.  We maxed out the 
traffic at about 800 simulated nodes, but could not fill the cluster 
completely. The formula for this bottleneck is:
    
    downloadTraffic = (supervisors * topologies * avgTopologyAssignmentSize)/10 
seconds + (supervisors * workersPerSupervisor * avgTopologyAssignmentSize)/10 
seconds
    
    For this one it scales with the number of supervisors and the number of 
topologies and the size of the assignments.  topologies * 
avgTopologyAssignmentSize feels like it would scale fairly linearly with the 
size of the cluster. A fully loaded larger cluster has more and/or bigger 
topologies running compared to a fully loaded smaller cluster. This actually 
scales quadratically with the size of the cluster.
    
    Turning on just gzip compression reduced the traffic to 1/5 th what we saw 
originally, and we were able to run 1179 simulated nodes fully loaded.  But to 
scale much further we needed to change fundamentally how we interacted with ZK. 
 That is why we wrote STORM-375.
    
    With a not complete version of STORM-375 that only updated the supervisors 
not the workers and gzip compression we were able to go to a fully loaded 1965 
nodes, and the download link from ZK was only 20.44MB/sec.  We expect that we 
could scale even higher, but we ran out of time on the cluster and we were 
hitting ulimit restrictions that would require us to reconfigure all of the 
nodes.
    
    I am not really sure what the next bottleneck is now.  Looking at the 
rudimentary latency data collected by ZK for each of the three nodes in the 
ensemble I can see that the worst case latencies are about 1.5 seconds now, and 
the average latency is about 2ms.  This is a lot worse then the 10ms worst case 
and 0 ms average I would see on an idle cluster. 



> Add compression to data stored in ZK
> ------------------------------------
>
>                 Key: STORM-376
>                 URL: https://issues.apache.org/jira/browse/STORM-376
>             Project: Apache Storm (Incubating)
>          Issue Type: Improvement
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: storm-2000.png
>
>
> If you run zookeeper with -Dzookeeper.forceSync=no the zookeeper disk no 
> longer is the bottleneck for scaling storm.  For us on a Gigabit Ethernet 
> (scale test cluster) it becomes the aggregate reads by all of the supervisors 
> and workers trying to download the compiled topology assignments.
> To reduce this load we took two approaches.  First we compressed the data 
> being stored in zookeeper (this JIRA) which also has the added benefit of 
> increasing the size of the topology you can store in ZK.  Second we used the 
> ZK version number to see if the data had changed and avoid downloading it 
> again needlessly (STORM-375).
> With these changes we were able to scale to a simulated 1965 nodes (5 
> supervisors running on each of 393 real nodes, with each supervisor 
> configured to have 10 slots).  We also filled the cluster with 131 topologies 
> of 100 workers each.   (we are going to 200 topos, and may try to scale the 
> cluster even larger, but it takes forever to launch topologies once the 
> cluster is under load.  We may try to address that shortly too)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to