[ 
https://issues.apache.org/jira/browse/HBASE-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623057#action_12623057
 ] 

Jim Kellerman commented on HBASE-810:
-------------------------------------

This is very ugly. The locking as depicted in HBASE-316 is essentially correct.

If we wanted to be more responsive during a split, we should use a tryLock in 
HRegion.getScanner(...)

This would allow us to scan (if we get the lock) or split (and continue from 
the last scanned row if we don't).

Maybe getScanner should do a synchronized(splitLock) as well.

Or, maybe splits should be more like cache flushes in that they only acquire a 
write lock at the end, when they are ready to move new HStores into place? No, 
that won't work for splits because splits require the master to reassign the 
children, whereas flushes and compactions continue to be served from the same 
HRegionServer and the row range for HRegion is the same.

It appears as if a region is splitting, any outstanding scanners either need to 
finish scanning the region (blocking the split) or the scanners need to be 
notified that a split is going to happen and they need to wait and recalibrate.


> Prevent temporary deadlocks when, during a scan with write operations, the 
> region splits
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-810
>                 URL: https://issues.apache.org/jira/browse/HBASE-810
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.2.1, 0.3.0
>
>
> HBASE-804 was not about the good problem, this one is. Anyone that iterates 
> through the results of a scanner and that rewrites data back into the row at 
> each iteration will hit a UnknownScannerException if a split occurs. See the 
> stack in the referred jira. Timeline :
> Split occurs, acquires a write lock and waits for scanners to finish
> The scanner in the custom code iterates and writes data until the write is 
> blocked by the lock
> deadlock
> The scanner timeouts thus the region splits but the USE will be thrown when 
> next() is called
> Inside a Map, the task will simply be retried when the first one fails. 
> Elsewhere, it becomes more complicated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to