I do not have root access on the xen cluster I'm using. I will ask the admin to make sure the disk is working properly. Regarding the mismatch versions though, are you suggesting that different region servers might be running different versions of hbase/hadoop? They are all running the same code from the same shared storage. There isn't even another version of hadoop anywhere for the other nodes to run. I think I'll try dropping my cluster down to 2 nodes and working back up... maybe I can pin point a specific problem node. Thanks for taking a look at my logs.
On Nov 24, 2007 5:49 PM, stack <[EMAIL PROTECTED]> wrote: > I took a quick look Kareem. As with the last time, hbase keeps having > trouble w/ the hdfs. Things start out fine around 16:00 then go bad > because can't write reliably to the hdfs -- a variety of reasons. You > then seem to restart the cluster around 17:37 or so and things seem to > go along fine for a while until 19:05 when again, all regionservers > report trouble writing the hdfs. Have you run an fsck? > > I also saw this strange message in a few regionserver logs which makes > me think there a mismatch versions somewhere: > > 2007-11-15 16:58:28,783 ERROR org.apache.hadoop.hbase.HRegionServer: Split or > compaction failed > java.io.IOException: Mismatch in writeChunk() args > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:1575) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:140) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:122) > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1715) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64) > at > org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:918) > at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:172) > at org.apache.hadoop.hbase.HStore.compactHelper(HStore.java:631) > at org.apache.hadoop.hbase.HStore.compactHelper(HStore.java:561) > at org.apache.hadoop.hbase.HStore.compact(HStore.java:556) > at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:711) > at > org.apache.hadoop.hbase.HRegionServer$SplitOrCompactChecker.checkForSplitsOrCompactions(HRegionServer.java:196) > at > org.apache.hadoop.hbase.HRegionServer$SplitOrCompactChecker.chore(HRegionServer.java:186) > at org.apache.hadoop.hbase.Chore.run(Chore.java:58) > > > St.Ack > > > Kareem Dana wrote: > > > Sure. I uploaded all my logs to my website: > > > > http://cs.duke.edu/~kcd/hadoop-logs/ > > > > My cluster consists of hadoop07-hadoop12. hadoop07 acts as the dfs > > master, mapred master, and hbase master. hadoop08-12 are dfs slaves, > > mapred task servers, and hbase regionservers. > > > > I also uploaded my hadoop-site.xml and hbase-site.xml on there also. > > Those are the only configuration values I changed. Each node in the > > cluster has these specs > > > > - Linux hadoop07 2.6.18-xenU #4 SMP Mon Oct 22 11:03:58 EDT 2007 i686 > > GNU/Linux > > - java version "1.5.0_06" > > Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) > > Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode, sharing) > > - Hadoop 0.15.0 release with no modifications and included hbase. > > > > Each machine has 2GB of local storage space, 400MB ram, 256mb swap. > > Let me know if anymore information will be helpful. Thanks for looking > > at the logs. > > > > - Kareem > > > > On Nov 20, 2007 10:01 AM, stack <[EMAIL PROTECTED]> wrote: > > > >> May I see your logs Kareem? What version of hbase? Can I see your > >> config. too? > >> Thanks, > >> > >> St.Ack > >> > >> > >
