[
https://issues.apache.org/jira/browse/HBASE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589725#action_12589725
]
Jim Kellerman commented on HBASE-10:
------------------------------------
It would be *nice* if when bugs are entered, the environment field was filled
in with at least the hadoop version number.
For this issue, the hadoop version was 0.16.1 or earlier. This can be
determined from the message:
{code}
2008-01-02 20:39:20,552 INFO org.apache.hadoop.fs.DFSClient: Could not obtain
block blk_5124700261538503923 from any node: java.io.IOException: No live nodes
contain current block
{code}
Note how DFSClient misidentifies itself as 'org.apache.hadoop.fs.DFSClient'.
DFSClient has always lived in org.apache.hadoop.dfs at least for any release
since Q4 2007. The reason it misidentifies itself is due to the way it gets its
logger in 0.16.1 and earlier:
{code}
public static final Log LOG =
LogFactory.getLog("org.apache.hadoop.fs.DFSClient");
{code}
In 0.16.2, this was changed to:
{code}
public static final Log LOG = LogFactory.getLog(DFSClient.class);
{code}
The message itself originates from DFSClient$DFSInputStream.chooseDataNode.
This method will try 3 times to get the block. It is called from both
DFSClient$DFSInputStream.blockSeekTo and
DFSClient$DFSInputStream.fetchBlockByteRange.
blockSeekTo does not do any retries, however it is called by
DFSClient$DFSInputStream.read which will make a total of 3 attempts at calling
blockSeekTo, so a single call to read will generate 9 messages if the block
cannot be found.
fetchBlockByteRange will retry calling chooseDataNode for each known block
location. If the replication factor is 3, then it would normally call
chooseDataNode 3 times. But suppose the namenode was trying to replicate the
block. This could result in the number of possible block locations being
greater than three. So at a minimum, chooseDataNode will generate 9 messages
when called from fetchBlockByteRange, and will add another 3 messages for each
additional entry in the block location list.
If multiple threads were doing reads, the number of messages generated could be
seemingly endless.
> HRegionServer hangs upon exit due to DFSClient Exception
> --------------------------------------------------------
>
> Key: HBASE-10
> URL: https://issues.apache.org/jira/browse/HBASE-10
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Environment: CentOS 5
> Reporter: Chris Kline
> Assignee: Jim Kellerman
> Priority: Minor
> Fix For: 0.2.0
>
>
> Several HRegionServers hang around indefinitely well after the HMaster has
> exited. This was triggered executing $HBASE_HOME/bin/stop-hbase.sh. The
> HMaster exists fine, but here is what happens on one of the HRegionServers:
> 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.HRegionServer: Got
> regionserver stop message
> 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020 closing leases
> 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases$LeaseMonitor:
> regionserver/0.0.0.0:60020.leaseChecker exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020 closed leases
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: Stopping server on
> 60020
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 2 on 60020: exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 0 on 60020: exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 7 on 60020: exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 3 on 60020: exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 5 on 60020: exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 9 on 60020: exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 6 on 60020: exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 4 on 60020: exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 1 on 60020: exiting
> 2008-01-02 18:54:01,909 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 60020
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 8 on 60020: exiting
> 2008-01-02 18:54:01,909 INFO org.apache.hadoop.hbase.HRegionServer: Stopping
> infoServer
> 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.Container: Stopping [EMAIL
> PROTECTED]
> 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: closing
> ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
> 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: IGNORED
> java.net.SocketException: Socket closed
> at java.net.PlainSocketImpl.socketAccept(Native Method)
> at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
> at java.net.ServerSocket.implAccept(ServerSocket.java:453)
> at java.net.ServerSocket.accept(ServerSocket.java:421)
> at
> org.mortbay.util.ThreadedServer.acceptSocket(ThreadedServer.java:432)
> at
> org.mortbay.util.ThreadedServer$Acceptor.run(ThreadedServer.java:631)
> 2008-01-02 18:54:01,910 INFO org.mortbay.util.ThreadedServer: Stopping
> Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
> 2008-01-02 18:54:01,910 DEBUG org.mortbay.util.ThreadedServer: Self connect
> to close listener /127.0.0.1:60030
> 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem
> stopping acceptor /127.0.0.1:
> 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem
> stopping acceptor /127.0.0.1:
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> at java.net.Socket.connect(Socket.java:519)
> at java.net.Socket.connect(Socket.java:469)
> at java.net.Socket.<init>(Socket.java:366)
> at java.net.Socket.<init>(Socket.java:209)
> at
> org.mortbay.util.ThreadedServer$Acceptor.forceStop(ThreadedServer.java:682)
> at org.mortbay.util.ThreadedServer.stop(ThreadedServer.java:557)
> at org.mortbay.http.SocketListener.stop(SocketListener.java:211)
> at org.mortbay.http.HttpServer.doStop(HttpServer.java:781)
> at org.mortbay.util.Container.stop(Container.java:154)
> at org.apache.hadoop.hbase.util.InfoServer.stop(InfoServer.java:237)
> at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:835)
> at java.lang.Thread.run(Thread.java:619)
> 2008-01-02 18:54:01,911 INFO org.mortbay.http.SocketListener: Stopped
> SocketListener on 0.0.0.0:60030
> 2008-01-02 18:54:01,912 DEBUG org.mortbay.util.Container: Stopping
> HttpContext[/static,/static]
> 2008-01-02 18:54:01,912 DEBUG org.mortbay.http.handler.AbstractHttpHandler:
> Stopped org.mortbay.http.handler.ResourceHandler in
> HttpContext[/static,/static]
> 2008-01-02 18:54:02,039 INFO org.mortbay.util.Container: Stopped
> HttpContext[/static,/static]
> 2008-01-02 18:54:02,039 DEBUG org.mortbay.util.Container: Stopping
> HttpContext[/logs,/logs]
> 2008-01-02 18:54:02,039 DEBUG org.mortbay.http.handler.AbstractHttpHandler:
> Stopped org.mortbay.http.handler.ResourceHandler in HttpContext[/logs,/logs]
> 2008-01-02 18:54:02,154 INFO org.mortbay.util.Container: Stopped
> HttpContext[/logs,/logs]
> 2008-01-02 18:54:02,154 DEBUG org.mortbay.util.Container: Stopping
> WebApplicationContext[/,/]
> 2008-01-02 18:54:02,154 DEBUG org.mortbay.util.Container: Stopping [EMAIL
> PROTECTED]
> 2008-01-02 18:54:02,155 INFO org.mortbay.util.Container: Stopped [EMAIL
> PROTECTED]
> 2008-01-02 18:54:02,277 DEBUG
> org.mortbay.jetty.servlet.AbstractSessionManager: Session scavenger exited
> 2008-01-02 18:54:02,278 DEBUG org.mortbay.util.Container: remove component:
> [EMAIL PROTECTED]
> 2008-01-02 18:54:02,278 INFO org.mortbay.util.Container: Stopped
> WebApplicationContext[/,/]
> 2008-01-02 18:54:02,278 INFO org.mortbay.util.Container: Stopped [EMAIL
> PROTECTED]
> 2008-01-02 18:54:02,278 DEBUG org.apache.hadoop.hbase.HRegionServer: closing
> region spider_pages,10_131455761,1198140179439
> 2008-01-02 18:54:02,278 INFO org.apache.hadoop.hbase.HRegionServer:
> regionserver/0.0.0.0:60020.cacheFlusher exiting
> 2008-01-02 18:54:02,278 INFO org.apache.hadoop.hbase.HRegionServer:
> regionserver/0.0.0.0:60020.compactor exiting
> 2008-01-02 18:54:02,278 INFO org.apache.hadoop.hbase.HRegionServer:
> regionserver/0.0.0.0:60020.splitter exiting
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HStore: closed
> spider_pages,10_131455761,1198140179439/search (1501227429/search)
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HStore: closed
> spider_pages,10_131455761,1198140179439/profile (1501227429/profile)
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HStore: closed
> spider_pages,10_131455761,1198140179439/meta (1501227429/meta)
> 2008-01-02 18:54:02,279 INFO org.apache.hadoop.hbase.HRegion: closed
> spider_pages,10_131455761,1198140179439
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HRegionServer: closing
> region spider_pages,10_486594261,1198319654267
> 2008-01-02 18:54:02,280 DEBUG org.apache.hadoop.hbase.HStore: closed
> spider_pages,10_486594261,1198319654267/search (364081590/search)
> 2008-01-02 18:54:02,280 DEBUG org.apache.hadoop.hbase.HStore: closed
> spider_pages,10_486594261,1198319654267/profile (364081590/profile)
> 2008-01-02 18:54:02,280 DEBUG org.apache.hadoop.hbase.HStore: closed
> spider_pages,10_486594261,1198319654267/meta (364081590/meta)
> 2008-01-02 18:54:02,280 INFO org.apache.hadoop.hbase.HRegion: closed
> spider_pages,10_486594261,1198319654267
> ...
> ... this closing of regions goes on for a while
> ...
> ... the following continues until a kill -9
> ...
> 2008-01-02 20:39:20,552 INFO org.apache.hadoop.fs.DFSClient: Could not obtain
> block blk_5124700261538503923 from any node: java.io.IOException: No live
> nodes contain current block
> 2008-01-02 20:40:23,556 INFO org.apache.hadoop.fs.DFSClient: Could not obtain
> block blk_5124700261538503923 from any node: java.io.IOException: No live
> nodes contain current block
> 2008-01-02 20:41:26,560 INFO org.apache.hadoop.fs.DFSClient: Could not obtain
> block blk_5124700261538503923 from any node: java.io.IOException: No live
> nodes contain current block
> 2008-01-02 20:42:29,566 INFO org.apache.hadoop.fs.DFSClient: Could not obtain
> block blk_5124700261538503923 from any node: java.io.IOException: No live
> nodes contain current block
> 2008-01-02 20:43:32,571 INFO org.apache.hadoop.fs.DFSClient: Could not obtain
> block blk_5124700261538503923 from any node: java.io.IOException: No live
> nodes contain current block
> ...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.