[ 
https://issues.apache.org/jira/browse/HBASE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663793#comment-13663793
 ] 

Matteo Bertozzi commented on HBASE-8310:
----------------------------------------

{quote}
1) The current snapshot A is blocked on table lock in snapshotTable(). Its 
snapshotHandler is not put into the map yet. 
2) The next snapshot B comes in and calls prepareToTakeSnapshot(). It will pass 
thru without being rejected since there is no current snapshotHandler in the 
map yet.
{quote}
This can't happen... prepareToTakeSnapshot() and snapshotTable() are under 
synchronized... 
so if B is asking for the same table gets the rejected exception if the first 
one doesn't fail...

{quote}4) snapshot A can not leave snapshotTable() because it is blocked on 
table lock.{quote}
lock on what? e.g. createTable(), deleteTable(), ...? maybe...
In this case having the prepare() inside the snapshotTable() is not a good idea 
since a snapshot on a different table may be executed. 
My guess is that the table lock integration with the snapshot was not very well 
thought. the snapshot and restore handlers are good for 94 where the table lock 
does not exists, but I guess that the addition of the table lock should have 
replaced those two. I'll be +1 on a patch that extract the table lock from the 
snapshot/restore handlers and replaces the current handlers map.
                
> HBase snapshot timeout default values and TableLockManger timeout
> -----------------------------------------------------------------
>
>                 Key: HBASE-8310
>                 URL: https://issues.apache.org/jira/browse/HBASE-8310
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.95.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>             Fix For: 0.98.0, 0.95.2, 0.94.9
>
>         Attachments: trunk.patch
>
>
> There are a few timeout values and defaults being used by HBase snapshot.
> DEFAULT_MAX_WAIT_TIME (60000 milli sec, 1 min) for client response
> TIMEOUT_MILLIS_DEFAULT (60000 milli sec, 1 min) for Procedure timeout
> SNAPSHOT_TIMEOUT_MILLIS_DEFAULT (60000 milli sec, 1 min) for region server 
> subprocedure  
> There is also other timeout involved, for example, 
> DEFAULT_TABLE_WRITE_LOCK_TIMEOUT_MS (10 mins) for 
> TakeSnapshotHandler#prepare()
> We could have this case:
> The user issues a sync snapshot request, waits for 1 min, and gets an 
> exception.
> In the meantime the snapshot handler is blocked on the table lock, and the 
> snapshot may continue to finish after 10 mins.
> But the user will probably re-issue the snapshot request during the 10 mins.
> This is a little confusing and messy when this happens.
> To be more reasonable, we should either increase the DEFAULT_MAX_WAIT_TIME or 
> decrease the table lock waiting time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to