Looks like the region server came down after I left office. Still, the second flow wasn't able to proceed writing to hbase. I hope such stress test can pass.
I am running our flow repeatedly using 0.90.1 rc1. On Mon, Feb 14, 2011 at 5:23 PM, Todd Lipcon <[email protected]> wrote: > Looks like it's just in the process of flushing something: > > "regionserver60020.cacheFlusher" daemon prio=10 tid=0x0000000050ce3000 > nid=0x32a9 runnable [0x0000000043138000] > java.lang.Thread.State: RUNNABLE > at > > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:797) > at > > org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:479) > - locked <0x000000072aecd6b8> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:448) > at > org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:81) > at > > org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1508) > at > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967) > at > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:894) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:846) > at > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:386) > at > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:194) > at > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:214) > > ie this is expected behavior. Does it not go down eventually? > > On Mon, Feb 14, 2011 at 5:17 PM, Ted Yu <[email protected]> wrote: > > > I applied Todd's patch from HBASE-3531 locally and built hbase jar. > > > > I restarted cluster. The first flow over 200GB data succeeded. > > The second flow got stuck. So I tried to shutdown the cluster. > > > > Here is stack trace for one of the region servers that refused to go > down: > > http://pastebin.com/yJQhhYp8 > > > > This is particularly interesting (other threads were waiting to lock > > 0x000000070be8a028): > > > > 1. "IPC Server handler 5 on 60020" daemon prio=10 > tid=0x00002aaab883c000 > > nid=0x32bb waiting on condition [0x0000000043f46000] > > 2. java.lang.Thread.State: WAITING (parking) > > 3. at sun.misc.Unsafe.park(Native Method) > > 4. - parking to wait for <0x000000070ca468a8> (a > > java.util.concurrent.locks.ReentrantLock$NonfairSync) > > 5. at > > java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > > 6. at > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) > > 7. at > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) > > 8. at > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) > > 9. at > > > > > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > > 10. at > > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > > 11. at > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:429) > > 12. - locked <0x000000070be8a028> (a > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > > 13. at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2562) > > > > Thanks for your attention. > > > > On Mon, Feb 14, 2011 at 1:12 PM, Todd Lipcon <[email protected]> wrote: > > > > > Gotcha, OK. I think I understand what's going on then. > > > I'm guessing you have a TON of log messages like this? > > > > > > 2011-02-14 06:27:23,116 DEBUG > > org.apache.hadoop.hbase.regionserver.HRegion: > > > NOT flushing memstore for region > > > > > > > > > NIGHTLYDEVGRIDSGRIDSQL-THREEGPPSPEECHCALLS-1297661454931,E\xC2\xC8\xBC\xD4\xA5\x04E\xC2\xC8\xBC\xD4\xA5\x04E\xC2\xC8\xBC\xD4\xA5\x04E\xC2\xC8\xBC\xD4\xA5\x04E\xC2\xC8\xAA,1297662719973.cd06c5f9e4a0ffcc16cb6a5e559cd5b3., > > > flushing=false, writesEnabled=false > > > > > > > > > -Todd > > > > > > On Mon, Feb 14, 2011 at 1:08 PM, Ted Yu <[email protected]> wrote: > > > > > > > The lock up happened before I shut down region server. > > > > I had to get out of that situation so that I can continue 0.90.1 > > > > validation. > > > > > > > > On Mon, Feb 14, 2011 at 12:59 PM, Todd Lipcon <[email protected]> > > wrote: > > > > > > > > > Hi Ted, > > > > > > > > > > What was the reason for the region server shutdown? ie was it > > aborting > > > > > itself, or you send it a kill signal, or what? > > > > > > > > > > Still trying to understand why this happened. > > > > > > > > > > -Todd > > > > > > > > > > On Mon, Feb 14, 2011 at 10:40 AM, Ted Yu <[email protected]> > > wrote: > > > > > > > > > > > Here is stack trace for one region server which didn't cleanly > > > > shutdown: > > > > > > http://pastebin.com/PEtEdi4g > > > > > > > > > > > > I noticed IPC Server handler 7 was holding lock in > > > > > reclaimMemStoreMemory(). > > > > > > > > > > > > Here is related snippet from the region server's log: > > > > > > http://pastebin.com/KeXppURX > > > > > > > > > > > > I noticed region 1297662719973.cd06c5f9e4a0ffcc16cb6a5e559cd5b3 > was > > > > > > splitting. > > > > > > > > > > > > Please advise whether there could be any relation between the > above > > > > > snippet > > > > > > and lock up in MemStoreFlusher. > > > > > > > > > > > > Thanks > > > > > > > > > > > > On Mon, Feb 14, 2011 at 9:20 AM, Ted Yu <[email protected]> > > wrote: > > > > > > > > > > > > > I disabled MSLAB. > > > > > > > My flow still couldn't make much progress. > > > > > > > > > > > > > > > > > > > > >> In this region server stack trace, I don't see > > > > > > >> MemStoreFlusher.reclaimMemStoreMemory() call: > > > > > > >> http://pastebin.com/uiBRidUa > > > > > > >> > > > > > > >> > > > > > > >> On Sun, Feb 13, 2011 at 1:14 PM, Ted Yu <[email protected]> > > > > wrote: > > > > > > >> > > > > > > >>> I am using hadoop-core-0.20.2-322.jar downloaded from Ryan's > > > repo. > > > > > > >>> FYI > > > > > > >>> > > > > > > >>> > > > > > > >>> On Sun, Feb 13, 2011 at 1:12 PM, Ted Yu <[email protected] > > > > > > wrote: > > > > > > >>> > > > > > > >>>> Since master server shut down, I restarted the cluster. > > > > > > >>>> The next flow over 200GB data got timed out. > > > > > > >>>> > > > > > > >>>> Here are some region server stat: > > > > > > >>>> > > > > > > >>>> request=0.0, regions=95, stores=213, storefiles=65, > > > > > > >>>> storefileIndexSize=99, memstoreSize=1311, > > compactionQueueSize=0, > > > > > > >>>> flushQueueSize=0, usedHeap=2532, maxHeap=3983, > > > > > blockCacheSize=6853968, > > > > > > >>>> blockCacheFree=828520304, blockCacheCount=0, > > > blockCacheHitCount=0, > > > > > > >>>> blockCacheMissCount=0, blockCacheEvictedCount=0, > > > > > blockCacheHitRatio=0, > > > > > > >>>> blockCacheHitCachingRatio=0 > > > > > > >>>> > > > > > > >>>> request=0.0, regions=95, stores=232, storefiles=72, > > > > > > >>>> storefileIndexSize=120, memstoreSize=301, > > compactionQueueSize=0, > > > > > > >>>> flushQueueSize=0, usedHeap=1740, maxHeap=3983, > > > > > > blockCacheSize=13110928, > > > > > > >>>> blockCacheFree=822263344, blockCacheCount=712, > > > > > > blockCacheHitCount=112478, > > > > > > >>>> blockCacheMissCount=712, blockCacheEvictedCount=0, > > > > > > blockCacheHitRatio=99, > > > > > > >>>> blockCacheHitCachingRatio=99 > > > > > > >>>> > > > > > > >>>> Thanks > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> On Sun, Feb 13, 2011 at 12:24 PM, Ryan Rawson < > > > [email protected] > > > > > > >wrote: > > > > > > >>>> > > > > > > >>>>> every handler thread, and every reader and also the accept > > > thread > > > > > are > > > > > > >>>>> all blocked on flushing memstore. The handlers get > blocked, > > > then > > > > > the > > > > > > >>>>> readers also have a finite handoff queue and they are > blocked > > > and > > > > > > also > > > > > > >>>>> the accept. > > > > > > >>>>> > > > > > > >>>>> But why isnt memstore flushing? Do you have regionserver > > > stats? > > > > > ie: > > > > > > >>>>> how much memstore global ram used? That is found on the > main > > > > page > > > > > of > > > > > > >>>>> the regionserver http service, also found in ganglia/file > > > stats. > > > > > > >>>>> > > > > > > >>>>> I havent looked at the logs yet, I'm off to lunch now. > > > > > > >>>>> > > > > > > >>>>> -ryan > > > > > > >>>>> > > > > > > >>>>> On Sun, Feb 13, 2011 at 8:44 AM, Ted Yu < > [email protected] > > > > > > > > wrote: > > > > > > >>>>> > I had 3 consecutive successful runs processing 200GB data > > for > > > > > each > > > > > > >>>>> run > > > > > > >>>>> > before hitting timeout problem in the 4th run. > > > > > > >>>>> > > > > > > > >>>>> > The 5th run couldn't proceed because master complained: > > > > > > >>>>> > > > > > > > >>>>> > 2011-02-13 16:11:45,173 FATAL > > > > > > org.apache.hadoop.hbase.master.HMaster: > > > > > > >>>>> Failed > > > > > > >>>>> > assignment of regions to > > > > > > >>>>> > serverName=sjc1-hadoop6.sjc1.carrieriq.com > > > > ,60020,1297518996557, > > > > > > >>>>> > load=(requests=0, regions=231, usedHeap=3535, > maxHeap=3983) > > > > > > >>>>> > > > > > > > >>>>> > but sjc1-hadoop6.sjc1 claimed: > > > > > > >>>>> > 2011-02-13 16:13:32,258 DEBUG > > > > > > >>>>> > org.apache.hadoop.hbase.regionserver.HRegionServer: No > > master > > > > > > found, > > > > > > >>>>> will > > > > > > >>>>> > retry > > > > > > >>>>> > > > > > > > >>>>> > Here is stack trace for sjc1-hadoop6.sjc1: > > > > > > >>>>> > http://pastebin.com/X8zWLXqu > > > > > > >>>>> > > > > > > > >>>>> > I didn't have chance to capture master stack trace as > > master > > > > > exited > > > > > > >>>>> after > > > > > > >>>>> > that. > > > > > > >>>>> > > > > > > > >>>>> > I also attach master and region server log on > > > sjc1-hadoop6.sjc1 > > > > - > > > > > > >>>>> pardon me > > > > > > >>>>> > for including individual email addresses as attachments > > > > wouldn't > > > > > go > > > > > > >>>>> through > > > > > > >>>>> > hbase.apache.org > > > > > > >>>>> > > > > > > > >>>>> > On Thu, Feb 10, 2011 at 5:05 PM, Todd Lipcon < > > > > [email protected]> > > > > > > >>>>> wrote: > > > > > > >>>>> >> > > > > > > >>>>> >> On Thu, Feb 10, 2011 at 4:54 PM, Ted Yu < > > > [email protected]> > > > > > > >>>>> wrote: > > > > > > >>>>> >> > > > > > > >>>>> >> > Thanks for the explanation. > > > > > > >>>>> >> > Assuming the mixed class loading is static, why did > this > > > > > > situation > > > > > > >>>>> >> > develop > > > > > > >>>>> >> > after 40 minutes of heavy load :-( > > > > > > >>>>> >> > > > > > > > >>>>> >> > > > > > > >>>>> >> You didn't hit global memstore pressure until 40 minutes > > of > > > > > load. > > > > > > >>>>> >> > > > > > > >>>>> >> -Todd > > > > > > >>>>> >> > > > > > > >>>>> >> On Thu, Feb 10, 2011 at 4:42 PM, Ryan Rawson < > > > > > [email protected]> > > > > > > >>>>> wrote: > > > > > > >>>>> >> > > > > > > > >>>>> >> > > It's a standard linking issue, you get one class > from > > > one > > > > > > >>>>> version > > > > > > >>>>> >> > > another from another, they are mostly compatible in > > > terms > > > > of > > > > > > >>>>> >> > > signatures (hence no exceptions) but are subtly > > > > incompatible > > > > > > in > > > > > > >>>>> >> > > different ways. In the stack trace you posted, the > > > > handlers > > > > > > were > > > > > > >>>>> >> > > blocked in: > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > at > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > > > > > > >>>>> >> > > > > > > > >>>>> > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:382) > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > and the thread: > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > "regionserver60020.cacheFlusher" daemon prio=10 > > > > > > >>>>> tid=0x00002aaabc21e000 > > > > > > >>>>> >> > > nid=0x7717 waiting for monitor entry > > > [0x0000000000000000] > > > > > > >>>>> >> > > java.lang.Thread.State: BLOCKED (on object > monitor) > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > was idle. > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > The cache flusher thread should be flushing, and yet > > > it's > > > > > > doing > > > > > > >>>>> >> > > nothing. This also happens to be one of the classes > > > that > > > > > were > > > > > > >>>>> >> > > changed. > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > On Thu, Feb 10, 2011 at 4:34 PM, Ted Yu < > > > > > [email protected]> > > > > > > >>>>> wrote: > > > > > > >>>>> >> > > > Can someone comment on my second question ? > > > > > > >>>>> >> > > > Thanks > > > > > > >>>>> >> > > > > > > > > > >>>>> >> > > > On Thu, Feb 10, 2011 at 4:25 PM, Ryan Rawson < > > > > > > >>>>> [email protected]> > > > > > > >>>>> >> > wrote: > > > > > > >>>>> >> > > > > > > > > > >>>>> >> > > >> As I suspected. > > > > > > >>>>> >> > > >> > > > > > > >>>>> >> > > >> It's a byproduct of our maven assembly process. > The > > > > > process > > > > > > >>>>> could > > > > > > >>>>> >> > > >> be > > > > > > >>>>> >> > > >> fixed. I wouldn't mind. I don't support runtime > > > > checking > > > > > of > > > > > > >>>>> jars, > > > > > > >>>>> >> > > >> there is such thing as too much tests, and this > is > > an > > > > > > example > > > > > > >>>>> of > > > > > > >>>>> >> > > >> it. > > > > > > >>>>> >> > > >> The check would then need a test, etc, etc. > > > > > > >>>>> >> > > >> > > > > > > >>>>> >> > > >> At SU we use new directories for each upgrade, > > > copying > > > > > the > > > > > > >>>>> config > > > > > > >>>>> >> > > >> over. With the lack of -default.xml this is > easier > > > than > > > > > > ever > > > > > > >>>>> (just > > > > > > >>>>> >> > > >> copy everything in conf/). With symlink > switchover > > > it > > > > > > makes > > > > > > >>>>> roll > > > > > > >>>>> >> > > >> forward/back as simple as doing a symlink > > switchover > > > or > > > > > > back. > > > > > > >>>>> I > > > > > > >>>>> >> > > >> have > > > > > > >>>>> >> > > >> to recommend this to everyone who doesnt have a > > > > > management > > > > > > >>>>> scheme. > > > > > > >>>>> >> > > >> > > > > > > >>>>> >> > > >> On Thu, Feb 10, 2011 at 4:20 PM, Ted Yu < > > > > > > [email protected] > > > > > > >>>>> > > > > > > > >>>>> >> > > >> wrote: > > > > > > >>>>> >> > > >> > hbase/hbase-0.90.1.jar leads > lib/hbase-0.90.0.jar > > > in > > > > > the > > > > > > >>>>> >> > > >> > classpath. > > > > > > >>>>> >> > > >> > I wonder > > > > > > >>>>> >> > > >> > 1. why hbase jar is placed in two directories - > > > > 0.20.6 > > > > > > >>>>> didn't use > > > > > > >>>>> >> > such > > > > > > >>>>> >> > > >> > structure > > > > > > >>>>> >> > > >> > 2. what from lib/hbase-0.90.0.jar could be > picked > > > up > > > > > and > > > > > > >>>>> why > > > > > > >>>>> >> > > >> > there > > > > > > >>>>> >> > > wasn't > > > > > > >>>>> >> > > >> > exception in server log > > > > > > >>>>> >> > > >> > > > > > > > >>>>> >> > > >> > I think a JIRA should be filed for item 2 above > - > > > > bail > > > > > > out > > > > > > >>>>> when > > > > > > >>>>> >> > > >> > the > > > > > > >>>>> >> > > two > > > > > > >>>>> >> > > >> > hbase jars from $HBASE_HOME and $HBASE_HOME/lib > > are > > > > of > > > > > > >>>>> different > > > > > > >>>>> >> > > >> versions. > > > > > > >>>>> >> > > >> > > > > > > > >>>>> >> > > >> > Cheers > > > > > > >>>>> >> > > >> > > > > > > > >>>>> >> > > >> > On Thu, Feb 10, 2011 at 3:40 PM, Ryan Rawson < > > > > > > >>>>> [email protected]> > > > > > > >>>>> >> > > wrote: > > > > > > >>>>> >> > > >> > > > > > > > >>>>> >> > > >> >> What do you get when you: > > > > > > >>>>> >> > > >> >> > > > > > > >>>>> >> > > >> >> ls lib/hbase* > > > > > > >>>>> >> > > >> >> > > > > > > >>>>> >> > > >> >> I'm going to guess there is hbase-0.90.0.jar > > there > > > > > > >>>>> >> > > >> >> > > > > > > >>>>> >> > > >> >> > > > > > > >>>>> >> > > >> >> > > > > > > >>>>> >> > > >> >> On Thu, Feb 10, 2011 at 3:25 PM, Ted Yu < > > > > > > >>>>> [email protected]> > > > > > > >>>>> >> > wrote: > > > > > > >>>>> >> > > >> >> > hbase-0.90.0-tests.jar and hbase-0.90.1.jar > > > > co-exist > > > > > > >>>>> >> > > >> >> > Would this be a problem ? > > > > > > >>>>> >> > > >> >> > > > > > > > >>>>> >> > > >> >> > On Thu, Feb 10, 2011 at 3:16 PM, Ryan Rawson > > > > > > >>>>> >> > > >> >> > <[email protected] > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > >> wrote: > > > > > > >>>>> >> > > >> >> > > > > > > > >>>>> >> > > >> >> >> You don't have both the old and the new > hbase > > > > jars > > > > > in > > > > > > >>>>> there > > > > > > >>>>> >> > > >> >> >> do > > > > > > >>>>> >> > > you? > > > > > > >>>>> >> > > >> >> >> > > > > > > >>>>> >> > > >> >> >> -ryan > > > > > > >>>>> >> > > >> >> >> > > > > > > >>>>> >> > > >> >> >> On Thu, Feb 10, 2011 at 3:12 PM, Ted Yu < > > > > > > >>>>> [email protected]> > > > > > > >>>>> >> > > wrote: > > > > > > >>>>> >> > > >> >> >> > .META. went offline during second flow > > > attempt. > > > > > > >>>>> >> > > >> >> >> > > > > > > > >>>>> >> > > >> >> >> > The time out I mentioned happened for 1st > > and > > > > 3rd > > > > > > >>>>> attempts. > > > > > > >>>>> >> > > HBase > > > > > > >>>>> >> > > >> was > > > > > > >>>>> >> > > >> >> >> > restarted before the 1st and 3rd > attempts. > > > > > > >>>>> >> > > >> >> >> > > > > > > > >>>>> >> > > >> >> >> > Here is jstack: > > > > > > >>>>> >> > > >> >> >> > http://pastebin.com/EHMSvsRt > > > > > > >>>>> >> > > >> >> >> > > > > > > > >>>>> >> > > >> >> >> > On Thu, Feb 10, 2011 at 3:04 PM, Stack < > > > > > > >>>>> [email protected]> > > > > > > >>>>> >> > > wrote: > > > > > > >>>>> >> > > >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> So, .META. is not online? What happens > if > > > you > > > > > use > > > > > > >>>>> shell > > > > > > >>>>> >> > > >> >> >> >> at > > > > > > >>>>> >> > > this > > > > > > >>>>> >> > > >> >> time. > > > > > > >>>>> >> > > >> >> >> >> > > > > > > >>>>> >> > > >> >> >> >> Your attachement did not come across > Ted. > > > > Mind > > > > > > >>>>> >> > > >> >> >> >> postbin'ing > > > > > > >>>>> >> > it? > > > > > > >>>>> >> > > >> >> >> >> > > > > > > >>>>> >> > > >> >> >> >> St.Ack > > > > > > >>>>> >> > > >> >> >> >> > > > > > > >>>>> >> > > >> >> >> >> On Thu, Feb 10, 2011 at 2:41 PM, Ted Yu > > > > > > >>>>> >> > > >> >> >> >> <[email protected] > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > >> wrote: > > > > > > >>>>> >> > > >> >> >> >> > I replaced hbase jar with > > hbase-0.90.1.jar > > > > > > >>>>> >> > > >> >> >> >> > I also upgraded client side jar to > > > > > > >>>>> hbase-0.90.1.jar > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > Our map tasks were running faster than > > > > before > > > > > > for > > > > > > >>>>> about > > > > > > >>>>> >> > > >> >> >> >> > 50 > > > > > > >>>>> >> > > >> minutes. > > > > > > >>>>> >> > > >> >> >> >> However, > > > > > > >>>>> >> > > >> >> >> >> > map tasks then timed out calling > > > > > flushCommits(). > > > > > > >>>>> This > > > > > > >>>>> >> > > happened > > > > > > >>>>> >> > > >> even > > > > > > >>>>> >> > > >> >> >> after > > > > > > >>>>> >> > > >> >> >> >> > fresh restart of hbase. > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > I don't see any exception in region > > server > > > > > logs. > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > In master log, I found: > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > 2011-02-10 18:24:15,286 DEBUG > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > > > > > > >>>>> >> > > >> Opened > > > > > > >>>>> >> > > >> >> >> region > > > > > > >>>>> >> > > >> >> >> >> > -ROOT-,,0.70236052 on > > sjc1-hadoop6.X.com > > > > > > >>>>> >> > ,60020,1297362251595 > > > > > > >>>>> >> > > >> >> >> >> > 2011-02-10 18:24:15,349 INFO > > > > > > >>>>> >> > > >> >> >> >> > > > > org.apache.hadoop.hbase.catalog.CatalogTracker: > > > > > > >>>>> >> > > >> >> >> >> > Failed verification of .META.,,1 at > > > > > > address=null; > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > org.apache.hadoop.hbase.NotServingRegionException: > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > org.apache.hadoop.hbase.NotServingRegionException: > > > > > > >>>>> >> > > >> >> >> >> > Region > > > > > > >>>>> >> > is > > > > > > >>>>> >> > > not > > > > > > >>>>> >> > > >> >> >> online: > > > > > > >>>>> >> > > >> >> >> >> > .META.,,1 > > > > > > >>>>> >> > > >> >> >> >> > 2011-02-10 18:24:15,350 DEBUG > > > > > > >>>>> >> > > >> >> >> org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > >>>>> >> > > >> >> >> >> > master:60000-0x12e10d0e31e0000 > Creating > > > (or > > > > > > >>>>> updating) > > > > > > >>>>> >> > > unassigned > > > > > > >>>>> >> > > >> >> node > > > > > > >>>>> >> > > >> >> >> for > > > > > > >>>>> >> > > >> >> >> >> > 1028785192 with OFFLINE state > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > I am attaching region server (which > > didn't > > > > > > respond > > > > > > >>>>> to > > > > > > >>>>> >> > > >> >> stop-hbase.sh) > > > > > > >>>>> >> > > >> >> >> >> jstack. > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > FYI > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > On Thu, Feb 10, 2011 at 10:10 AM, > Stack > > > > > > >>>>> >> > > >> >> >> >> > <[email protected]> > > > > > > >>>>> >> > > >> wrote: > > > > > > >>>>> >> > > >> >> >> >> >> > > > > > > >>>>> >> > > >> >> >> >> >> Thats probably enough Ted. The > 0.90.1 > > > > > > >>>>> >> > > >> >> >> >> >> hbase-default.xml > > > > > > >>>>> >> > has > > > > > > >>>>> >> > > an > > > > > > >>>>> >> > > >> >> extra > > > > > > >>>>> >> > > >> >> >> >> >> config. to enable the experimental > > > > HBASE-3455 > > > > > > >>>>> feature > > > > > > >>>>> >> > > >> >> >> >> >> but > > > > > > >>>>> >> > > you > > > > > > >>>>> >> > > >> can > > > > > > >>>>> >> > > >> >> >> copy > > > > > > >>>>> >> > > >> >> >> >> >> that over if you want to try playing > > with > > > > it > > > > > > (it > > > > > > >>>>> >> > > >> >> >> >> >> defaults > > > > > > >>>>> >> > > off > > > > > > >>>>> >> > > >> so > > > > > > >>>>> >> > > >> >> >> you'd > > > > > > >>>>> >> > > >> >> >> >> >> copy over the config. if you wanted > to > > > set > > > > it > > > > > > to > > > > > > >>>>> true). > > > > > > >>>>> >> > > >> >> >> >> >> > > > > > > >>>>> >> > > >> >> >> >> >> St.Ack > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > > > > > > > >>>>> >> > > >> >> >> >> > > > > > > >>>>> >> > > >> >> >> > > > > > > > >>>>> >> > > >> >> >> > > > > > > >>>>> >> > > >> >> > > > > > > > >>>>> >> > > >> >> > > > > > > >>>>> >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > >>>>> >> > > > > > > > > > >>>>> >> > > > > > > > > >>>>> >> > > > > > > > >>>>> >> > > > > > > >>>>> >> > > > > > > >>>>> >> > > > > > > >>>>> >> -- > > > > > > >>>>> >> Todd Lipcon > > > > > > >>>>> >> Software Engineer, Cloudera > > > > > > >>>>> > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Todd Lipcon > > > > > Software Engineer, Cloudera > > > > > > > > > > > > > > > > > > > > > -- > > > Todd Lipcon > > > Software Engineer, Cloudera > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
