Re: tables disappearing after upgrading 0.20.3 => 0.20.4

Bryan McCormick Thu, 13 May 2010 23:49:05 -0700

I had a similar upgrade experience from 20.3 to 20.4. 

The master started off continuously reassigning regions as quickly as it could. 
Looking at the Master web UI for listing a table, it listed the regions 
properly (spread across the regionservers). But looking at the individual 
regionservers web UI (the list of tables on a regionserver) it appeared that 
each regionserver thought that it had a copy of every region. So the total 
number of regions reported was 5x normal for my 5 node cluster.


After a little while of this continuos reassigning, it appears that the 
regionserver holding .META. would have an issue writing updates to HDFS and 
then force .META. to reassign. Looking at the logs, the only error on the 
regionserver was:

2010-05-07 22:34:21,699 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read: 
java.io.IOException: Cannot open filename 
/hbase/.META./1028785192/info/2937322648368577689
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1824)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
        at java.io.DataInputStream.read(DataInputStream.java:132)
        at 
org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
        at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1018)
        at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:966)
        at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1291)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:98)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:68)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:72)
        at 
org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1304)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.initHeap(HRegion.java:1850)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1883)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1906)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)


And the only errors on the datanodes were (there were many of these, I'm 
including just one):

2010-05-07 22:34:21,701 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.0.0.61:50010, 
storageID=DS-548401723-10.0.0.61-50010-1258275076629, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.IOException: Block blk_-3471558578366937156_600043 is not valid.
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
        at java.lang.Thread.run(Thread.java:619)

After this bad move of .META. I would get errors in the master log stating that 
hregioninfo was empty for each region so the regions were being deleted. A few 
minutes after this HBase reported through the web UI and through the hbase 
shell list command that there were no tables on my cluster. Luckily it didn't 
appear that data was erased and a restart of hbase/hdfs started the whole 
process over again. 

Eventually I noticed after my third run through this that hbase seemed to be 
mixing ips (10.0.0.61) and fqdns (h1.readpath.com) in the log lines. So I made 
sure to add all hosts to each server's /etc/hosts and then push this out to all 
of the servers(instead of each server only having it's own name in /etc/hosts 
as had been working in 20.3). It appears that 20.4 might be more finicky about 
dns resolution. Once I did this the master stopped continually reassigning the 
regions. 

Bryan


On May 13, 2010, at 8:09 PM, Stack wrote:

> Whats the shelll say?  Does it see the tables consistently?  Can you
> count your content consistently?
> St.Ack
> 
> On Thu, May 13, 2010 at 4:53 PM, Viktors Rotanovs
> <viktors.rotan...@gmail.com> wrote:
>> Hi,
>> 
>> after upgrading from 0.20.3 to 0.20.4 a list of tables almost
>> immediately becomes inconsistent - master.jsp shows no tables even
>> after creating test table in hbase shell, tables which were available
>> before start randomly appearing and disappearing, etc. Upgrading was
>> done by stopping, upgrading code, and then starting (no dump/restore
>> was done).
>> I didn't investigate yet, just checking if somebody had the same
>> problem or if I did upgrade right (I had exactly the same issue in the
>> past when trying to apply HBASE-2174 manually).
>> 
>> Environment:
>> Small tables, <100k rows
>> Amazon EC2, "c1.xlarge" instance type with Ubuntu 9.10 and EBS root,
>> HBase installed manually
>> 1 master (namenode + jobtracker + master), 3 slaves (tasktracker +
>> datanode + regionserver + zookeeper)
>> Hadoop 0.20.1+169.68~1.karmic-cdh2 from Cloudera distribution
>> Flaky DNS issue present, happens about once per day even with dnsmasq
>> installed (heartbeat every 1s, dnsmasq forwards requests once per
>> minute), DDNS set for internal hostnames.
>> 
>> This is a testing cluster, nothing important on it.
>> 
>> Cheers,
>> -- Viktors
>>

Re: tables disappearing after upgrading 0.20.3 => 0.20.4

Reply via email to