[
https://issues.apache.org/jira/browse/HBASE-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544167#comment-13544167
]
Jean-Daniel Cryans commented on HBASE-7455:
-------------------------------------------
I'm trying to investigate right now what the other problems are with
TestReplication, right now I'm getting this weird case that kills a RS:
{noformat}
2013-01-04 10:04:45,500 WARN [IPC Server handler 8 on 57099]
namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to
rename
/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
to
/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
because destination's parent does not exist
2013-01-04 10:04:45,503 WARN
[RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher]
regionserver.Store(847): Unable to rename
hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
to
hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
2013-01-04 10:04:45,504 WARN [DataStreamer for file
/user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769]
hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769
File does not exist. [Lease. Holder:
DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1]
{noformat}
> Increase timeouts in TestReplication and TestSplitLogWorker
> -----------------------------------------------------------
>
> Key: HBASE-7455
> URL: https://issues.apache.org/jira/browse/HBASE-7455
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.4
>
> Attachments: 7455-0.94.txt, 7455-0.96.txt
>
>
> When I measure the times in TestReplication.queueFailover, it takes about 15s
> on my (reasonably fast) Laptop.
> The timeout in queueFailover currently is 1500*2*15 = 45000ms.
> For setup before each test (which truncates the table and waits for the
> changes to replicate) it is 1500*15 = 22500ms.
> Interestingly I see queueFailover failures where the wait time is measured as
> 64260ms and some at 72316ms.
> Since these numbers are not even close to 45000ms, the machine or JVM must
> have been stuck for 15 or almost 30s (otherwise we'd get a timeout and the
> total time spent should be close to the timeout).
> So I would suggest that we increase the timeouts further.
> We could set SLEEP_TIME to 2000 and retries to 20. Would lead to 2000*2*20 =
> 80000ms.
> Any objections?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira