[
https://issues.apache.org/jira/browse/HELIX-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joy updated HELIX-535:
----------------------
Attachment: xaf
xae
xad
xac
xab
xaa
The controller log is split into 6 files to workaround the size limit
> Helix controller stops working with heavy configuration
> -------------------------------------------------------
>
> Key: HELIX-535
> URL: https://issues.apache.org/jira/browse/HELIX-535
> Project: Apache Helix
> Issue Type: Bug
> Components: helix-core
> Environment: machine:$ uname -a
> Linux eat1-app373.stg 2.6.32-220.10.1.el6.x86_64 #1 SMP Fri Mar 9 12:37:51
> EST 2012 x86_64 x86_64 x86_64 GNU/Linux
> JVM version: $ /export/apps/jdk/current/bin/java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
> Reporter: Joy
> Attachments: xaa, xab, xac, xad, xae, xaf
>
>
> The issue consistently comes up with heavy configuration: higher number of
> znodes, higher number of partitions, and higher number of databases.
> The goal of our tests is to evaluate the performance of helix controller (in
> terms of controller latency) with increased number of nodes, databases and
> partitions.
> In our test, we use multiple machines: one for zookeeper, one for helix
> controller, and the rest are for dummy processes. The configuration is as
> below:
> zkr <----------> helix
> ^
> |
> V
> dummy processes
> We intentionally kill the master dummy processes once every 30 seconds to
> simulate a failure event. Everything works fine with light configuration such
> as: 27 nodes + 1db + 729 partitions. However, when the configuration is
> heavy, such as 81 nodes + 10 databases + 81 partitions for each db, the
> controller latency increases significantly after several failure events:
> Control Latency (ms)
> First event : 182
> Second event: 188
> Third event: 200
> Fourth Event: 193
> Fifth event: 200
> Sixth event: 185
> Seventh event: 189
> Eight event: 213
> Ninth Event: 1082209
> And then after this extremely long failure, the helix controller stop
> working. The controller log is as attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)