[jira] Commented: (HBASE-3295) Dropping a 1k+ regions table likely ends in a client socket timeout and it's very confusing

Jonathan Gray (JIRA) Wed, 01 Dec 2010 11:39:36 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965815#action_12965815
 ]


Jonathan Gray commented on HBASE-3295:
--------------------------------------

This is basically the same as HBASE-3229 (except there it's only troubling, the 
operation does seem to succeed).  My opinion is that all operations that hit 
the master should be async (or fast).  create/enable/disable/drop/etc should be 
async w/ another method to check the status.  we shouldn't have long running 
operations holding open rpc requests.

> Dropping a 1k+ regions table likely ends in a client socket timeout and it's 
> very confusing
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3295
>                 URL: https://issues.apache.org/jira/browse/HBASE-3295
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.90.0
>
>
> I tried truncating a 1.6k regions table from the shell and, after the usual 
> disabling timeout, I then got a socket timeout on the second invocation while 
> it was dropping. It looked like this:
> {noformat}
> ERROR: java.net.SocketTimeoutException: Call to sv2borg180/10.20.20.180:61000 
> failed on socket timeout exception:
>  java.net.SocketTimeoutException: 60000 millis timeout while waiting for 
> channel to be ready for read. ch :
>  java.nio.channels.SocketChannel[connected local=/10.20.20.180:59153 
> remote=sv2borg180/10.20.20.180:61000]
> {noformat}
> At first I thought that was coming from the master because HDFS was somehow 
> slow, but then understood that it was my socket that timed out meaning that 
> the master was still dropping the table. Calling truncate again, I got:
> {noformat}
> ERROR: Unknown table TestTable!
> {noformat}
> Which means that the table would be deleted... I learned later that it wasn't 
> totally deleted after I shut down the cluster. So it leaves me in a situation 
> where I have to manually delete the files on the FS and the remaining .META. 
> entries.
> Since I expect a few people will hit this issue rather soon, for 0.90.0, I 
> propose we just set the socket timeout really high in the shell. For 0.90.1, 
> or 0.92, we should do for drop what we do for disabling.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3295) Dropping a 1k+ regions table likely ends in a client socket timeout and it's very confusing

Reply via email to