[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206941#comment-14206941
 ] 

Lars Hofhansl commented on HBASE-12457:
---------------------------------------

Right. The timing is hard though. It seems the master considers the region 
closed once it sent the CLOSE.

One option I though about is for the HRegion.doClose() to interrupt any 
compactions running (i.e. interrupt the CompactSplitThread). Then upon 
receiving an interrupted exception the compactor would recheck 
writestate.writesEnabled rather than waiting for the next 10mb chunk to finish 
writing.
The symptom here looks like the compactor just hanging in some IO (either 
scanner.next or writer.append - my bet is on the latter). An interrupt can 
break out of that and allow the compactor to recheck the condition.
Might be easiest to explain with a patch. :)

> Regions in transition for a long time when CLOSE interleaves with a slow 
> compaction
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-12457
>                 URL: https://issues.apache.org/jira/browse/HBASE-12457
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.7
>            Reporter: Lars Hofhansl
>         Attachments: 12457-minifix.txt
>
>
> Under heave load we have observed regions remaining in transition for 20 
> minutes when the master requests a close while a slow compaction is running.
> The pattern is always something like this:
> # RS starts a compaction
> # HM request the region to be closed on this RS
> # Compaction is not aborted for another 20 minutes
> # The region is in transition and not usable.
> In every case I tracked down so far the time between the requested CLOSE and 
> abort of the compaction is almost exactly 20 minutes, which is suspicious.
> Of course part of the issue is having compactions that take over 20 minutes, 
> but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to