Hi Sriram, > Can you tell me the exact steps to repro the problem: > - What version of Hbase?
SVN trunk, 0.20.0-dev > - Which version of Heritrix? Heritrix 2.0, plus the HBase writer which can be found here: http://code.google.com/p/hbase-writer/ > What is happening is that the KFS chunkserver is sending > writes down to disk and they aren't coming back "soon > enough", causing things to backlog; the chunkserver is > printing out the backlog status message. I wonder if this might be a secondary effect. Just before these messages begin streaming into the log, the chunkserver suddenly balloons its address space from ~200KB to ~100GB. These two things have strong correlation and happen in the same order in repeatable manner. Once the backlog messages begin, no further IO completes as far as I can tell. The count of outstanding IOs monotonically increases. Also, the metaserver declares the chunkserver dead. I can take steps to help diagnose the problem. Please advise. Would it help if I replicate the problem again with chunkserver logging at DEBUG and then post the compressed logs somewhere? [...] > On Thu, Apr 16, 2009 at 12:27 AM, Andrew Purtell > > > > Hi, > > > > Like Ryan I have been trying to run HBase on top of > > KFS. In my case I am running a SVN snapshot from > > yesterday. I have a minimal installation of KFS > > metaserver, chunkserver, and HBase master and > > regionserver all running on one test host with 4GB of > > RAM. Of course I do not expect more than minimal > > function. To apply some light load, I run the Heritrix > > crawler with 5 TOE threads which write on average > > 200 Kbit/sec of data into HBase, which flushes this > > incoming data in ~64MB increments and also runs > > occasional compaction cycles where the 64MB flush > > files will be compacted into ~256MB files. > > > > I find that for no obvious reason the chunkserver will > > suddenly grab ~100 GIGAbytes of address space and emit > > a steady stream of "(DiskManager.cc:392) Too many disk > > IOs (N)" to the log at INFO level, where N is a > > steadily increasing number. The host is under moderate > > load at the time -- KFS is busy -- but is not in swap > > and according to atop has some disk I/O and network > > bandwidth to spare. [...]
