I see this log following the previous snippet:
2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block null bad datanode[0] nodes == null
2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Could not get
block locations. Source file
"/hbase/.logs/sjc9-flash-grid02.carrieriq.com,60020,1285347585107/hlog.dat.1285351187512"
- Aborting...
2010-09-24 11:21:45,417 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log in
abort
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/.logs/sjc9-flash-grid02.carrieriq.com,60020,1285347585107/hlog.dat.1285351187512
File does not exist. Holder DFSClient_302121899 does not have any open
files.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1378)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1369)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1290)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:451)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:613)
at java.lang.Thread.run(Thread.java:619)
Would failure from hlog.close() lead to data loss ?
Thanks
On Fri, Sep 24, 2010 at 8:48 PM, Stack <[email protected]> wrote:
> When aborting its skips flushing memstore.
>
> We're aborting because we lost our session with zookeeper. When
> aborting we try to exit swiftly and without doing anything that could
> interfere w/ the recovery process that has probably already started.
> Since recovery process will recover our memstore edits from WAL log,
> there isn't much point completing a flush.
>
> St.Ack
>
> On Fri, Sep 24, 2010 at 5:13 PM, Ted Yu <[email protected]> wrote:
> > I think that explains why memstore wasn't flushed.
> > See snippet from region server log:
> >
> > 2010-09-24 11:21:18,603 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> > Total=6.5512085MB (6869440), Free=791.7863MB (830248128), Max=798.3375MB
> > (837117568), Counts: Blocks=1, Access=384, Hit=383, Miss=1, Evictions=0,
> > Evicted=0, Ratios: Hit Ratio=99.73958134651184%, Miss
> > Ratio=0.2604166744276881%, Evicted/Run=NaN
> > 2010-09-24 11:21:20,459 WARN org.apache.zookeeper.ClientCnxn: Exception
> > closing session 0x12b44afa1f50001 to sun.nio.ch.selectionkeyi...@3d3cdaa
> > java.io.IOException: TIMED OUT
> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> > 2010-09-24 11:21:21,512 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> > state: Disconnected, type: None, path: null
> > 2010-09-24 11:21:22,849 INFO org.apache.zookeeper.ClientCnxn: Attempting
> > connection to server sjc9-flash-grid01.carrieriq.com/10.32.56.155:2181
> > 2010-09-24 11:21:22,849 INFO org.apache.zookeeper.ClientCnxn: Priming
> > connection to java.nio.channels.SocketChannel[connected local=/
> > 10.32.56.156:45455 remote=
> sjc9-flash-grid01.carrieriq.com/10.32.56.155:2181]
> > 2010-09-24 11:21:22,873 INFO org.apache.zookeeper.ClientCnxn: Server
> > connection successful
> > 2010-09-24 11:21:23,065 WARN org.apache.zookeeper.ClientCnxn: Exception
> > closing session 0x12b44afa1f50001 to sun.nio.ch.selectionkeyi...@383de71c
> > java.io.IOException: Session Expired
> > at
> >
> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
> > at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
> > 2010-09-24 11:21:23,065 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> > state: Expired, type: None, path: null
> > 2010-09-24 11:21:23,065 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
> > expired
> > 2010-09-24 11:21:23,101 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> > request=0.0, regions=45, stores=119, storefiles=5, storefileIndexSize=0,
> > memstoreSize=7, compactionQueueSize=0, usedHeap=61, maxHeap=3991,
> > blockCacheSize=6869440, blockCacheFree=830248128, blockCacheCount=1,
> > blockCacheHitRatio=99, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> > 2010-09-24 11:21:24,457 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> > server on 60020
> >
> > Obviously this method in HRegionServer triggered abort():
> > public void process(WatchedEvent event) {
> >
> > In this situation, why should internalFlushcache() be governed by abort ?
> >
> > Thanks
> >
> > On Fri, Sep 24, 2010 at 4:54 PM, Jean-Daniel Cryans <[email protected]
> >wrote:
> >
> >> Like that:
> >>
> >>
> http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/HRegion.html#519
> >>
> >> (this is a link to the 0.89 code, but 0.20 works exactly the same way)
> >>
> >> J-D
> >>
> >> On Fri, Sep 24, 2010 at 4:49 PM, Ted Yu <[email protected]> wrote:
> >> > When HBase cluster shuts down, how is memstore flushing triggered ?
> >> >
> >> > Thanks
> >> >
> >> > On Fri, Sep 24, 2010 at 4:01 PM, Jean-Daniel Cryans <
> [email protected]
> >> >wrote:
> >> >
> >> >> > StripedHBaseTable is just a wrapper around HTable.
> >> >>
> >> >> Ok.
> >> >>
> >> >> > QA installed another application which queries HBase. When QA
> didn't
> >> see
> >> >> > meaningful data from the application.
> >> >> > Why didn't HBase serve data from Memstore ?
> >> >>
> >> >> I cannot answer this question, as far as I know 0.20.6 doesn't have a
> >> >> bug that prevents you from writing data (and many people use it in
> >> >> production environments). My guess is, the data never got into HBase
> >> >> or it was written with wrong timestamps and is hidden by delete
> >> >> markers or something "time-travelly" like that. If they do play with
> >> >> timestamps, ask them not to or verify they do the right thing.
> >> >>
> >> >> > QA used stop-hbase.sh to stop HBase. Would Memstore contents be
> >> flushed
> >> >> to
> >> >> > Storefile at that moment ?
> >> >>
> >> >> Yep, if you don't see any store file then the data never made it into
> >> >> hbase.
> >> >>
> >> >> >
> >> >> > QA servers were running JVM version 1.6.0_18
> >> >>
> >> >> From
> >> >>
> >>
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
> >> >>
> >> >> "Java 1.6.x, preferably from Sun. Use the latest version available
> >> >> except u18 (u19 is fine)."
> >> >>
> >> >> J-D
> >> >>
> >> >
> >>
> >
>