[ 
https://issues.apache.org/jira/browse/HBASE-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546169#comment-14546169
 ] 

Hudson commented on HBASE-13651:
--------------------------------

SUCCESS: Integrated in HBase-0.94-security #582 (See 
[https://builds.apache.org/job/HBase-0.94-security/582/])
HBASE-13651 Handle StoreFileScanner FileNotFoundException (addendum) 
(matteo.bertozzi: rev e5a433802c486abae89aa6e4bb1159bbff4d3d9f)
* 
src/test/java/org/apache/hadoop/hbase/regionserver/TestCorruptedRegionStoreFile.java


> Handle StoreFileScanner FileNotFoundException
> ---------------------------------------------
>
>                 Key: HBASE-13651
>                 URL: https://issues.apache.org/jira/browse/HBASE-13651
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.27, 0.98.10.1
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>            Priority: Minor
>             Fix For: 2.0.0, 0.94.28, 0.98.13, 1.2.0
>
>         Attachments: HBASE-13651-0.94-draft.patch, HBASE-13651-draft.patch, 
> HBASE-13651-v0-0.94.patch, HBASE-13651-v0-0.98.patch, 
> HBASE-13651-v0-branch-1.patch, HBASE-13651-v0.patch
>
>
> Example:
>  * Machine-1 is serving Region-X and start compaction
>  * Machine-1 goes in GC pause
>  * Region-X gets reassigned to Machine-2
>  * Machine-1 exit from the GC pause
>  * Machine-1 (re)moves the compacted files
>  * Machine-1 get the lease expired and shutdown
> Machine-2 has now tons of FileNotFoundException on scan. If we reassign the 
> region everything is ok, because we pickup the files compacted by Machine-1.
> This problem doesn't happen in the new code 1.0+  (i think but I haven't 
> checked, it may be 1.1) where we write on the WAL the compaction event before 
> (re)moving the files.
> A workaround is handling FileNotFoundException and refresh the store files, 
> or shutdown the region and reassign. the first one is easy in 1.0+ the second 
> one requires more work because at the moment we don't have the code to notify 
> the master that the RS is closing the region, alternatively we can shutdown 
> the entire RS (it is not a good solution but the case is rare enough)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to