Thanks,

I had a look into this and running :- ./hbase org.jruby.Main
add_table.rb hdfs://<server>/hbase/<table>

Fixed the problem with the meta in this case for us.

In what cases would a datanode failure (for example running out of
memory in ourcase) cause HBase data loss?
Would it mostly only causes dataloss to the meta regions or does it
also cause problems with the actual region files?

On 25 May 2010 18:47, Jean-Daniel Cryans <[email protected]> wrote:
> The edits to .META. were likely lost, so scanning .META. won't solve
> the issue (although it could be smarter and figure that there's a
> hole, find the missing region on HDFS, and add it back).
>
> So your region is probably physically on HDFS. See the
> bin/add_table.rb script that will help you getting that line back in
> .META., do disable your table before running it. Search the archives
> of this mailing for others who had the same issue if something doesn't
> seem clear.
>
> I'd also like to point out that those edits were lost because HDFS
> won't support fsSync until 0.21, so data loss is likely in the face of
> machine and process failure.
>
> J-D
>
> On Mon, May 24, 2010 at 2:39 PM, Dan Harvey <[email protected]> wrote:
>> Hi,
>>
>> Sorry for the multiple e-mails, it seems gmail didn't send my whole
>> message last time! Anyway here it goes again...
>>
>> Whilst loading data via a mapreduce job into HBase I have started getting
>> this error :-
>>
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> contact region server Some server, retryOnlyOne=true, index=0,
>> islastrow=false, tries=9, numtries=10, i=0, listsize=19,
>> region=source_documents,ipubmed\x219915054,1274525958679 for region
>> source_documents,ipubmed\x219915054,1274525958679, row 'u1012913162',
>> but failed after 10 attempts.
>> Exceptions:
>> at 
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1166)
>> at 
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>
>> In the master there are the following three regions :-
>>
>> source_documents,ipubmed\x219859228,1274701893687       hadoop1
>> 1825870642      ipubmed\x219859228      ipubmed\x219915054
>> source_documents,ipubmed\x219915054,1274525958679       hadoop4
>> 193393334        ipubmed\x219915054      u102193588
>> source_documents,u102193588,1274486550122                    hadoop4
>> 2141795358      u102193588                    u105043522
>>
>> and on one of our 5 nodes I found a region which start with
>>
>> ipubmed\x219915054 and ends with u102002564
>>
>> and on another I found the other half of the split which starts with
>>
>> u102002564 and ends with u102193588
>>
>> So it seems that the middle region on the master was split apart but
>> that failed to reach the master.
>>
>> We've had a few problems over the last few days with hdfs nodes
>> failing due to lack of memory which has now been fixed but could have
>> been a cause of this problem.
>>
>> What ways can a split fail to be received by the master and how long
>> would it take for hbase to fix this? I've read it periodically will
>> scan the META table to find problems like this but didn't say how
>> often? It has been about 12h here and our cluster didn't appear to
>> have fixed this missing split, is there a way to force the master to
>> rescan the META table? Will it fix problems like this given time?
>>
>> Thanks,
>>
>> --
>> Dan Harvey | Datamining Engineer
>> www.mendeley.com/profiles/dan-harvey
>>
>> Mendeley Limited | London, UK | www.mendeley.com
>> Registered in England and Wales | Company Number 6419015
>>
>

-- 
Dan Harvey | Datamining Engineer
www.mendeley.com/profiles/dan-harvey

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Reply via email to