[ 
https://issues.apache.org/jira/browse/HADOOP-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561554#action_12561554
 ] 

Konstantin Shvachko commented on HADOOP-2680:
---------------------------------------------

I think these problems are related to the slowness of the mounted disk drive.
In case of TestEditLog name-node is supposed to make 100,000 transactions and 
synchronize them to disk.
And this takes more than 10 minutes with less than a half of transaction 
completed. 
Here is the relevant part of the log:
{code}
    [junit] 2008-01-18 13:38:04,406 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 2727 Total time 
for transactions(ms): 180 Number of syncs: 54 SyncTimes(ms): 33105 28125 
    [junit] 2008-01-18 13:39:05,135 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 5728 Total time 
for transactions(ms): 231 Number of syncs: 113 SyncTimes(ms): 63838 58107 
    [junit] 2008-01-18 13:40:05,714 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 8874 Total time 
for transactions(ms): 294 Number of syncs: 177 SyncTimes(ms): 95192 87320 
    [junit] 2008-01-18 13:41:05,910 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 12400 Total 
time for transactions(ms): 378 Number of syncs: 247 SyncTimes(ms): 125102 
117586 
    [junit] 2008-01-18 13:42:06,136 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 15279 Total 
time for transactions(ms): 432 Number of syncs: 303 SyncTimes(ms): 156284 
146617 
    [junit] 2008-01-18 13:43:06,820 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 18602 Total 
time for transactions(ms): 467 Number of syncs: 369 SyncTimes(ms): 187323 
176248 
    [junit] 2008-01-18 13:44:07,818 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 21877 Total 
time for transactions(ms): 514 Number of syncs: 434 SyncTimes(ms): 218905 
205651 
    [junit] 2008-01-18 13:45:08,400 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 25358 Total 
time for transactions(ms): 562 Number of syncs: 503 SyncTimes(ms): 250827 
234296 
    [junit] 2008-01-18 13:46:08,565 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 28183 Total 
time for transactions(ms): 611 Number of syncs: 559 SyncTimes(ms): 281769 
263511 
    [junit] 2008-01-18 13:47:09,728 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 31349 Total 
time for transactions(ms): 639 Number of syncs: 622 SyncTimes(ms): 311388 
295040 
    [junit] 2008-01-18 13:48:10,725 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 34534 Total 
time for transactions(ms): 668 Number of syncs: 686 SyncTimes(ms): 342038 
325370 
    [junit] 2008-01-18 13:49:11,673 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 37197 Total 
time for transactions(ms): 695 Number of syncs: 738 SyncTimes(ms): 372024 
356322 
    [junit] 2008-01-18 13:50:12,050 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 40113 Total 
time for transactions(ms): 723 Number of syncs: 796 SyncTimes(ms): 403170 
385541 
    [junit] 2008-01-18 13:51:12,981 INFO  fs.FSNamesystem 
(FSEditLog.java:printStatistics(772)) - Number of transactions: 42734 Total 
time for transactions(ms): 745 Number of syncs: 848 SyncTimes(ms): 432674 
416957 
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
    [junit] Test org.apache.hadoop.dfs.TestEditLog FAILED (timeout)
{code}

In case of TestDFSStorageStateRecovery the name-node tries to recover from the 
aborted rollback, which means 
it needs to rename some directories. And this also takes more than a minute and 
causes the test to timeout.
{code}
    [junit] 2008-01-22 12:16:40,198 INFO  dfs.Storage 
(Storage.java:doRecover(369)) - Recovering storage directory 
/export/home/hudson/hudson/jobs/Hadoop-Nightly/workspace/trunk/build/test/data/name1
 from previous rollback.
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
    [junit] Test org.apache.hadoop.dfs.TestDFSStorageStateRecovery FAILED 
(timeout)
{code}

Could anyone please check the partition Hundson uses for storage directories.

> Unit test fails with a timeout on nightly build: 
> org.apache.hadoop.dfs.TestDFSStorageStateRecovery
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2680
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2680
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Mukund Madhugiri
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.16.0
>
>         Attachments: build-369-TestEditLog.txt
>
>
> The unit test: TestDFSStorageStateRecovery has failed a couple of times with 
> a timeout on the nightly build:
> Here are the logs from the failures:
> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/373/console
> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/369/console
> Here is a build where it passed:
> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/365/console
> Here is the part where it failed:
> [junit] 2008-01-22 12:16:39,499 INFO  dfs.TestDFSStorageStateRecovery 
> (TestDFSStorageStateRecovery.java:log(84)) - 
> ============================================================
>     [junit] 2008-01-22 12:16:39,502 INFO  dfs.TestDFSStorageStateRecovery 
> (TestDFSStorageStateRecovery.java:log(85)) - ***TEST 34*** NAME_NODE 
> recovery: numDirs=1 testCase=17 current=false previous=true 
> previous.tmp=false removed.tmp=true
>     [junit] 2008-01-22 12:16:39,499 INFO  ipc.Server (Server.java:run(939)) - 
> IPC Server handler 4 on 36207: exiting
>     [junit] 2008-01-22 12:16:39,499 INFO  ipc.Server (Server.java:run(939)) - 
> IPC Server handler 3 on 36207: exiting
>     [junit] 2008-01-22 12:16:39,499 INFO  ipc.Server (Server.java:run(939)) - 
> IPC Server handler 7 on 36207: exiting
>     [junit] 2008-01-22 12:16:39,499 INFO  ipc.Server (Server.java:run(353)) - 
> Stopping IPC Server listener on 36207
>     [junit] 2008-01-22 12:16:39,499 INFO  ipc.Server (Server.java:run(939)) - 
> IPC Server handler 2 on 36207: exiting
>     [junit] 2008-01-22 12:16:40,153 INFO  jvm.JvmMetrics 
> (RpcMetrics.java:setTags(49)) - Initializing RPC Metrics with 
> serverName=NameNode, port=0
>     [junit] javax.management.InstanceAlreadyExistsException: 
> hadoop.dfs:service=NameNode,name=RpcStatistics
>     [junit]   at 
> com.sun.jmx.mbeanserver.RepositorySupport.addMBean(RepositorySupport.java:452)
>     [junit]   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1410)
>     [junit]   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:936)
>     [junit]   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:337)
>     [junit]   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:497)
>     [junit]   at 
> org.apache.hadoop.metrics.util.MBeanUtil.registerMBean(MBeanUtil.java:50)
>     [junit]   at org.apache.hadoop.ipc.metrics.RpcMgt.<init>(RpcMgt.java:37)
>     [junit]   at 
> org.apache.hadoop.ipc.metrics.RpcMetrics.<init>(RpcMetrics.java:60)
>     [junit]   at org.apache.hadoop.ipc.Server.<init>(Server.java:969)
>     [junit]   at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:393)
>     [junit]   at org.apache.hadoop.ipc.RPC.getServer(RPC.java:355)
>     [junit] 2008-01-22 12:16:40,156 INFO  dfs.NameNode 
> (NameNode.java:initialize(125)) - Namenode up at: localhost/127.0.0.1:36768
>     [junit]   at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:119)
>     [junit]   at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:174)
>     [junit] 2008-01-22 12:16:40,156 INFO  jvm.JvmMetrics 
> (JvmMetrics.java:init(51)) - Cannot initialize JVM Metrics with 
> processName=NameNode, sessionId=null - already initialized
>     [junit]   at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:160)
>     [junit]   at 
> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:849)
>     [junit]   at 
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:195)
>     [junit]   at 
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:134)
>     [junit]   at 
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:86)
>     [junit]   at 
> org.apache.hadoop.dfs.TestDFSStorageStateRecovery.testStorageStates(TestDFSStorageStateRecovery.java:193)
>     [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     [junit]   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     [junit]   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     [junit]   at java.lang.reflect.Method.invoke(Method.java:585)
>     [junit]   at junit.framework.TestCase.runTest(TestCase.java:154)
>     [junit]   at junit.framework.TestCase.runBare(TestCase.java:127)
>     [junit] 2008-01-22 12:16:40,158 INFO  dfs.NameNodeMetrics 
> (NameNodeMetrics.java:<init>(74)) - Initializing NameNodeMeterics using 
> context object:org.apache.hadoop.metrics.spi.NullContext
>     [junit]   at junit.framework.TestResult$1.protect(TestResult.java:106)
>     [junit]   at junit.framework.TestResult.runProtected(TestResult.java:124)
>     [junit]   at junit.framework.TestResult.run(TestResult.java:109)
>     [junit]   at junit.framework.TestCase.run(TestCase.java:118)
>     [junit]   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>     [junit]   at junit.framework.TestSuite.run(TestSuite.java:203)
>     [junit]   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
>     [junit]   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
>     [junit]   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567)
>     [junit] javax.management.InstanceAlreadyExistsException: 
> hadoop.dfs:service=NameNode,name=NameNodeStatistics
>     [junit]   at 
> com.sun.jmx.mbeanserver.RepositorySupport.addMBean(RepositorySupport.java:452)
>     [junit]   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1410)
>     [junit]   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:936)
>     [junit]   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:337)
>     [junit]   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:497)
>     [junit]   at 
> org.apache.hadoop.metrics.util.MBeanUtil.registerMBean(MBeanUtil.java:50)
>     [junit]   at 
> org.apache.hadoop.dfs.namenode.metrics.NameNodeMgt.<init>(NameNodeMgt.java:36)
>     [junit]   at 
> org.apache.hadoop.dfs.NameNodeMetrics.<init>(NameNodeMetrics.java:67)
>     [junit]   at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:127)
>     [junit]   at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:174)
>     [junit]   at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:160)
>     [junit]   at 
> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:849)
>     [junit]   at 
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:195)
>     [junit]   at 
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:134)
>     [junit]   at 
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:86)
>     [junit]   at 
> org.apache.hadoop.dfs.TestDFSStorageStateRecovery.testStorageStates(TestDFSStorageStateRecovery.java:193)
>     [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     [junit]   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     [junit]   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     [junit]   at java.lang.reflect.Method.invoke(Method.java:585)
>     [junit]   at junit.framework.TestCase.runTest(TestCase.java:154)
>     [junit]   at junit.framework.TestCase.runBare(TestCase.java:127)
>     [junit]   at junit.framework.TestResult$1.protect(TestResult.java:106)
>     [junit]   at junit.framework.TestResult.runProtected(TestResult.java:124)
>     [junit]   at junit.framework.TestResult.run(TestResult.java:109)
>     [junit]   at junit.framework.TestCase.run(TestCase.java:118)
>     [junit]   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>     [junit]   at junit.framework.TestSuite.run(TestSuite.java:203)
>     [junit]   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
>     [junit]   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
>     [junit]   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567)
>     [junit] 2008-01-22 12:16:40,193 INFO  fs.FSNamesystem 
> (FSNamesystem.java:setConfigurationParameters(321)) - fsOwner=hudson,hudson
>     [junit] 2008-01-22 12:16:40,195 INFO  fs.FSNamesystem 
> (FSNamesystem.java:setConfigurationParameters(325)) - supergroup=supergroup
>     [junit] 2008-01-22 12:16:40,195 INFO  fs.FSNamesystem 
> (FSNamesystem.java:setConfigurationParameters(326)) - isPermissionEnabled=true
>     [junit] 2008-01-22 12:16:40,198 INFO  dfs.Storage 
> (Storage.java:doRecover(369)) - Recovering storage directory 
> /export/home/hudson/hudson/jobs/Hadoop-Nightly/workspace/trunk/build/test/data/name1
>  from previous rollback.
>     [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
>     [junit] Test org.apache.hadoop.dfs.TestDFSStorageStateRecovery FAILED 
> (timeout) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to