[
https://issues.apache.org/jira/browse/HADOOP-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561554#action_12561554
]
Konstantin Shvachko commented on HADOOP-2680:
---------------------------------------------
I think these problems are related to the slowness of the mounted disk drive.
In case of TestEditLog name-node is supposed to make 100,000 transactions and
synchronize them to disk.
And this takes more than 10 minutes with less than a half of transaction
completed.
Here is the relevant part of the log:
{code}
[junit] 2008-01-18 13:38:04,406 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 2727 Total time
for transactions(ms): 180 Number of syncs: 54 SyncTimes(ms): 33105 28125
[junit] 2008-01-18 13:39:05,135 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 5728 Total time
for transactions(ms): 231 Number of syncs: 113 SyncTimes(ms): 63838 58107
[junit] 2008-01-18 13:40:05,714 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 8874 Total time
for transactions(ms): 294 Number of syncs: 177 SyncTimes(ms): 95192 87320
[junit] 2008-01-18 13:41:05,910 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 12400 Total
time for transactions(ms): 378 Number of syncs: 247 SyncTimes(ms): 125102
117586
[junit] 2008-01-18 13:42:06,136 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 15279 Total
time for transactions(ms): 432 Number of syncs: 303 SyncTimes(ms): 156284
146617
[junit] 2008-01-18 13:43:06,820 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 18602 Total
time for transactions(ms): 467 Number of syncs: 369 SyncTimes(ms): 187323
176248
[junit] 2008-01-18 13:44:07,818 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 21877 Total
time for transactions(ms): 514 Number of syncs: 434 SyncTimes(ms): 218905
205651
[junit] 2008-01-18 13:45:08,400 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 25358 Total
time for transactions(ms): 562 Number of syncs: 503 SyncTimes(ms): 250827
234296
[junit] 2008-01-18 13:46:08,565 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 28183 Total
time for transactions(ms): 611 Number of syncs: 559 SyncTimes(ms): 281769
263511
[junit] 2008-01-18 13:47:09,728 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 31349 Total
time for transactions(ms): 639 Number of syncs: 622 SyncTimes(ms): 311388
295040
[junit] 2008-01-18 13:48:10,725 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 34534 Total
time for transactions(ms): 668 Number of syncs: 686 SyncTimes(ms): 342038
325370
[junit] 2008-01-18 13:49:11,673 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 37197 Total
time for transactions(ms): 695 Number of syncs: 738 SyncTimes(ms): 372024
356322
[junit] 2008-01-18 13:50:12,050 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 40113 Total
time for transactions(ms): 723 Number of syncs: 796 SyncTimes(ms): 403170
385541
[junit] 2008-01-18 13:51:12,981 INFO fs.FSNamesystem
(FSEditLog.java:printStatistics(772)) - Number of transactions: 42734 Total
time for transactions(ms): 745 Number of syncs: 848 SyncTimes(ms): 432674
416957
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] Test org.apache.hadoop.dfs.TestEditLog FAILED (timeout)
{code}
In case of TestDFSStorageStateRecovery the name-node tries to recover from the
aborted rollback, which means
it needs to rename some directories. And this also takes more than a minute and
causes the test to timeout.
{code}
[junit] 2008-01-22 12:16:40,198 INFO dfs.Storage
(Storage.java:doRecover(369)) - Recovering storage directory
/export/home/hudson/hudson/jobs/Hadoop-Nightly/workspace/trunk/build/test/data/name1
from previous rollback.
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] Test org.apache.hadoop.dfs.TestDFSStorageStateRecovery FAILED
(timeout)
{code}
Could anyone please check the partition Hundson uses for storage directories.
> Unit test fails with a timeout on nightly build:
> org.apache.hadoop.dfs.TestDFSStorageStateRecovery
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2680
> URL: https://issues.apache.org/jira/browse/HADOOP-2680
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Mukund Madhugiri
> Assignee: Konstantin Shvachko
> Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: build-369-TestEditLog.txt
>
>
> The unit test: TestDFSStorageStateRecovery has failed a couple of times with
> a timeout on the nightly build:
> Here are the logs from the failures:
> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/373/console
> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/369/console
> Here is a build where it passed:
> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/365/console
> Here is the part where it failed:
> [junit] 2008-01-22 12:16:39,499 INFO dfs.TestDFSStorageStateRecovery
> (TestDFSStorageStateRecovery.java:log(84)) -
> ============================================================
> [junit] 2008-01-22 12:16:39,502 INFO dfs.TestDFSStorageStateRecovery
> (TestDFSStorageStateRecovery.java:log(85)) - ***TEST 34*** NAME_NODE
> recovery: numDirs=1 testCase=17 current=false previous=true
> previous.tmp=false removed.tmp=true
> [junit] 2008-01-22 12:16:39,499 INFO ipc.Server (Server.java:run(939)) -
> IPC Server handler 4 on 36207: exiting
> [junit] 2008-01-22 12:16:39,499 INFO ipc.Server (Server.java:run(939)) -
> IPC Server handler 3 on 36207: exiting
> [junit] 2008-01-22 12:16:39,499 INFO ipc.Server (Server.java:run(939)) -
> IPC Server handler 7 on 36207: exiting
> [junit] 2008-01-22 12:16:39,499 INFO ipc.Server (Server.java:run(353)) -
> Stopping IPC Server listener on 36207
> [junit] 2008-01-22 12:16:39,499 INFO ipc.Server (Server.java:run(939)) -
> IPC Server handler 2 on 36207: exiting
> [junit] 2008-01-22 12:16:40,153 INFO jvm.JvmMetrics
> (RpcMetrics.java:setTags(49)) - Initializing RPC Metrics with
> serverName=NameNode, port=0
> [junit] javax.management.InstanceAlreadyExistsException:
> hadoop.dfs:service=NameNode,name=RpcStatistics
> [junit] at
> com.sun.jmx.mbeanserver.RepositorySupport.addMBean(RepositorySupport.java:452)
> [junit] at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1410)
> [junit] at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:936)
> [junit] at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:337)
> [junit] at
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:497)
> [junit] at
> org.apache.hadoop.metrics.util.MBeanUtil.registerMBean(MBeanUtil.java:50)
> [junit] at org.apache.hadoop.ipc.metrics.RpcMgt.<init>(RpcMgt.java:37)
> [junit] at
> org.apache.hadoop.ipc.metrics.RpcMetrics.<init>(RpcMetrics.java:60)
> [junit] at org.apache.hadoop.ipc.Server.<init>(Server.java:969)
> [junit] at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:393)
> [junit] at org.apache.hadoop.ipc.RPC.getServer(RPC.java:355)
> [junit] 2008-01-22 12:16:40,156 INFO dfs.NameNode
> (NameNode.java:initialize(125)) - Namenode up at: localhost/127.0.0.1:36768
> [junit] at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:119)
> [junit] at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:174)
> [junit] 2008-01-22 12:16:40,156 INFO jvm.JvmMetrics
> (JvmMetrics.java:init(51)) - Cannot initialize JVM Metrics with
> processName=NameNode, sessionId=null - already initialized
> [junit] at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:160)
> [junit] at
> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:849)
> [junit] at
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:195)
> [junit] at
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:134)
> [junit] at
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:86)
> [junit] at
> org.apache.hadoop.dfs.TestDFSStorageStateRecovery.testStorageStates(TestDFSStorageStateRecovery.java:193)
> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit] at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit] at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit] at java.lang.reflect.Method.invoke(Method.java:585)
> [junit] at junit.framework.TestCase.runTest(TestCase.java:154)
> [junit] at junit.framework.TestCase.runBare(TestCase.java:127)
> [junit] 2008-01-22 12:16:40,158 INFO dfs.NameNodeMetrics
> (NameNodeMetrics.java:<init>(74)) - Initializing NameNodeMeterics using
> context object:org.apache.hadoop.metrics.spi.NullContext
> [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
> [junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
> [junit] at junit.framework.TestResult.run(TestResult.java:109)
> [junit] at junit.framework.TestCase.run(TestCase.java:118)
> [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
> [junit] at junit.framework.TestSuite.run(TestSuite.java:203)
> [junit] at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
> [junit] at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
> [junit] at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567)
> [junit] javax.management.InstanceAlreadyExistsException:
> hadoop.dfs:service=NameNode,name=NameNodeStatistics
> [junit] at
> com.sun.jmx.mbeanserver.RepositorySupport.addMBean(RepositorySupport.java:452)
> [junit] at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1410)
> [junit] at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:936)
> [junit] at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:337)
> [junit] at
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:497)
> [junit] at
> org.apache.hadoop.metrics.util.MBeanUtil.registerMBean(MBeanUtil.java:50)
> [junit] at
> org.apache.hadoop.dfs.namenode.metrics.NameNodeMgt.<init>(NameNodeMgt.java:36)
> [junit] at
> org.apache.hadoop.dfs.NameNodeMetrics.<init>(NameNodeMetrics.java:67)
> [junit] at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:127)
> [junit] at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:174)
> [junit] at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:160)
> [junit] at
> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:849)
> [junit] at
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:195)
> [junit] at
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:134)
> [junit] at
> org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:86)
> [junit] at
> org.apache.hadoop.dfs.TestDFSStorageStateRecovery.testStorageStates(TestDFSStorageStateRecovery.java:193)
> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit] at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit] at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit] at java.lang.reflect.Method.invoke(Method.java:585)
> [junit] at junit.framework.TestCase.runTest(TestCase.java:154)
> [junit] at junit.framework.TestCase.runBare(TestCase.java:127)
> [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
> [junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
> [junit] at junit.framework.TestResult.run(TestResult.java:109)
> [junit] at junit.framework.TestCase.run(TestCase.java:118)
> [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
> [junit] at junit.framework.TestSuite.run(TestSuite.java:203)
> [junit] at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
> [junit] at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
> [junit] at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567)
> [junit] 2008-01-22 12:16:40,193 INFO fs.FSNamesystem
> (FSNamesystem.java:setConfigurationParameters(321)) - fsOwner=hudson,hudson
> [junit] 2008-01-22 12:16:40,195 INFO fs.FSNamesystem
> (FSNamesystem.java:setConfigurationParameters(325)) - supergroup=supergroup
> [junit] 2008-01-22 12:16:40,195 INFO fs.FSNamesystem
> (FSNamesystem.java:setConfigurationParameters(326)) - isPermissionEnabled=true
> [junit] 2008-01-22 12:16:40,198 INFO dfs.Storage
> (Storage.java:doRecover(369)) - Recovering storage directory
> /export/home/hudson/hudson/jobs/Hadoop-Nightly/workspace/trunk/build/test/data/name1
> from previous rollback.
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.hadoop.dfs.TestDFSStorageStateRecovery FAILED
> (timeout)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.