One things I have noticed, is that even if HBase logs this error:

2008-07-23 15:45:44,618 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: error opening region page-repository,http://dbpedia.org/resource/List_of_American_institutions_of_higher_education/lang155,1216448364079 java.io.FileNotFoundException: File does not exist: hdfs://hadoop1.sindice.net:54310/hbase/page-repository/1105668475/field/mapfiles/5122893264992435570/data

When I query for the row "http://dbpedia.org/resource/List_of_American_institutions_of_higher_education/lang155";, HBase retrieves it successfully.

I will try to apply your patch (on the trunk or release candidate ?), but I am taking holidays soon so I am not sure if I will have the time.

Thanks.
--
Renaud Delbru

stack wrote:
Renaud Delbru wrote:
We are using HBase 0.2.0-dev, Hudson Build #208.

Then its a bug. In the branch we made it so that we scream in the logs but then keep going figuring this the general preference rather than have the cluster stuck cycling deploying/failing/redeploying/etc. the horked region.

What should have happened instead is that the region will successfully deploy but with the following left in the log:

       LOG.warn("Mapfile " + mapfile.toString() + " has empty data. " +
"Deleting. Continuing...Probable DATA LOSS!!! See HBASE-646.");

Would you mind trying the patch in https://issues.apache.org/jira/browse/HBASE-766? It 'fixes' TRUNK so it does the above.

That we're 'losing' the 'data' file from StoreFiles/MapFiles in times of 'stress' is disconcerting. In my experience, it happened here once when there was a storm in HDFS. We lost more than one data file (If we lose the MapFile index, hbase will make a repair reconstructing it). To debug, we would need to run with DEBUG enabled on HDFS but no one likes doing that on the off-chance that there'll be an incident because of the shear volume of logs generated. We need to somehow develop the particular sequence that can provoke these losses. I've opened an issue for now -- HBASE-767 -- to track loss of 'data' files.

Thanks,
St.Ack



Reply via email to