[jira] Updated: (HBASE-3295) Dropping a 1k+ regions table likely ends in a client socket timeout and it's very confusing

stack (JIRA) Wed, 01 Dec 2010 22:35:35 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack updated HBASE-3295:
-------------------------

    Attachment: 3295-v2.txt

HBASE-3229 is actually a little different in that as its currently written all 
is run from the master.

Also, here's v2 of patch.  There is a tension in the shell in that most of the 
retries and timesouts are tuned down because its expected that there is a human 
waiting but these enable/disable/drops can take a long time.  v2 ups amount of 
time we'll wait on these enable/disable/drop operations (as well as including 
the v1 change to make drop run async).

If we do time out, then it should be fine just rerunning the operation -- 
truncate in this case.  If not, then thats a bug... a different bug than this.

> Dropping a 1k+ regions table likely ends in a client socket timeout and it's 
> very confusing
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3295
>                 URL: https://issues.apache.org/jira/browse/HBASE-3295
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.90.0
>
>         Attachments: 3295-v2.txt, 3295.txt
>
>
> I tried truncating a 1.6k regions table from the shell and, after the usual 
> disabling timeout, I then got a socket timeout on the second invocation while 
> it was dropping. It looked like this:
> {noformat}
> ERROR: java.net.SocketTimeoutException: Call to sv2borg180/10.20.20.180:61000 
> failed on socket timeout exception:
>  java.net.SocketTimeoutException: 60000 millis timeout while waiting for 
> channel to be ready for read. ch :
>  java.nio.channels.SocketChannel[connected local=/10.20.20.180:59153 
> remote=sv2borg180/10.20.20.180:61000]
> {noformat}
> At first I thought that was coming from the master because HDFS was somehow 
> slow, but then understood that it was my socket that timed out meaning that 
> the master was still dropping the table. Calling truncate again, I got:
> {noformat}
> ERROR: Unknown table TestTable!
> {noformat}
> Which means that the table would be deleted... I learned later that it wasn't 
> totally deleted after I shut down the cluster. So it leaves me in a situation 
> where I have to manually delete the files on the FS and the remaining .META. 
> entries.
> Since I expect a few people will hit this issue rather soon, for 0.90.0, I 
> propose we just set the socket timeout really high in the shell. For 0.90.1, 
> or 0.92, we should do for drop what we do for disabling.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3295) Dropping a 1k+ regions table likely ends in a client socket timeout and it's very confusing

Reply via email to