[ 
https://issues.apache.org/jira/browse/HBASE-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-1436.
--------------------------

    Resolution: Fixed

Committed the below:

{code}
Index: src/java/org/apache/hadoop/hbase/regionserver/Store.java
===================================================================
--- src/java/org/apache/hadoop/hbase/regionserver/Store.java    (revision 
777167)
+++ src/java/org/apache/hadoop/hbase/regionserver/Store.java    (working copy)
@@ -356,7 +356,15 @@
         LOG.warn("Skipping " + p + " because its empty. HBASE-646 DATA LOSS?");
         continue;
       }
-      StoreFile curfile = new StoreFile(fs, p);
+      StoreFile curfile = null;
+      try {
+        curfile = new StoreFile(fs, p);
+      } catch (IOException ioe) {
+        LOG.warn("Failed open of " + p + "; presumption is that file was " +
+          "corrupted at flush and lost edits picked up by commit log replay. " 
+
+          "Verify!", ioe);
+        continue;
+      }
       long storeSeqId = curfile.getMaxSequenceId();
       if (storeSeqId > this.maxSeqId) {
         this.maxSeqId = storeSeqId;
{code}

We just keep going logging the corrupted file

> Killing regionserver can make corrupted hfile
> ---------------------------------------------
>
>                 Key: HBASE-1436
>                 URL: https://issues.apache.org/jira/browse/HBASE-1436
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.0
>
>
> Testing sync patch I've been killing HRS.  Its pretty easy making corrupt 
> hfile doing this:
> {code}
> 2009-05-18 23:00:42,889 [regionserver/0:0:0:0:0:0:0:0:60021.worker] ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening 
> TestTable,0651512447,1242687355411
> java.io.IOException: Trailer 'header' is wrong; does the trailer size match 
> content?
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1289)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:799)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:744)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:217)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFile.<init>(StoreFile.java:107)
>         at 
> org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:359)
>         at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:206)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1839)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:290)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1556)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1527)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1442)
>         at java.lang.Thread.run(Thread.java:619)
> {code}
> This issue is about just removing the corrupted store file and moving on.  
> Currently region can't open because we keep getting above exception.  Should 
> also make sure that its safe to just remove, that the replay of the HRS log 
> files will have the content of memcache that failed persisting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to