[ 
https://issues.apache.org/jira/browse/HBASE-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544167#comment-13544167
 ] 

Jean-Daniel Cryans commented on HBASE-7455:
-------------------------------------------

I'm trying to investigate right now what the other problems are with 
TestReplication, right now I'm getting this weird case that kills a RS:

{noformat}
2013-01-04 10:04:45,500 WARN  [IPC Server handler 8 on 57099] 
namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to 
rename 
/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
 to 
/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
 because destination's parent does not exist
2013-01-04 10:04:45,503 WARN  
[RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] 
regionserver.Store(847): Unable to rename 
hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
 to 
hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
2013-01-04 10:04:45,504 WARN  [DataStreamer for file 
/user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769]
 hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: 
org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769
 File does not exist. [Lease.  Holder: 
DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1]
{noformat}
                
> Increase timeouts in TestReplication and TestSplitLogWorker
> -----------------------------------------------------------
>
>                 Key: HBASE-7455
>                 URL: https://issues.apache.org/jira/browse/HBASE-7455
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: 7455-0.94.txt, 7455-0.96.txt
>
>
> When I measure the times in TestReplication.queueFailover, it takes about 15s 
> on my (reasonably fast) Laptop.
> The timeout in queueFailover currently is 1500*2*15 = 45000ms.
> For setup before each test (which truncates the table and waits for the 
> changes to replicate) it is 1500*15 = 22500ms.
> Interestingly I see queueFailover failures where the wait time is measured as 
> 64260ms and some at 72316ms.
> Since these numbers are not even close to 45000ms, the machine or JVM must 
> have been stuck for 15 or almost 30s (otherwise we'd get a timeout and the 
> total time spent should be close to the timeout).
> So I would suggest that we increase the timeouts further.
> We could set SLEEP_TIME to 2000 and retries to 20. Would lead to 2000*2*20 = 
> 80000ms.
> Any objections?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to