[
https://issues.apache.org/jira/browse/HBASE-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383156#comment-15383156
]
stack commented on HBASE-16212:
-------------------------------
Tell us more [~dengzh]? I think I get it. The thread local often held reference
to a header from another file altogether and this was making for all the
logging you were seeing?
Looking at the patch you are making substantial changes removing the thread
local that caches last header read by thread and instead doing the caching on
the fsreaderimpl which is better in some ways but now we have a synchronization
bottleneck for all threads to pass through. What you thinking here? You
thinking it will be rare that more than one thread will be going against same
file? Have you run with this patch?
Is this patch for branch-1.1? Does master still have same issue (has same basic
form but a bunch of refactoring has gone on in here).
This patch looks like a nice one. Thanks.
> Many connections to datanode are created when doing a large scan
> -----------------------------------------------------------------
>
> Key: HBASE-16212
> URL: https://issues.apache.org/jira/browse/HBASE-16212
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 1.1.2
> Reporter: Zhihua Deng
> Attachments: HBASE-16212.patch, HBASE-16212.v2.patch,
> regionserver-dfsinputstream.log
>
>
> As described in https://issues.apache.org/jira/browse/HDFS-8659, the datanode
> is suffering from logging the same repeatedly. Adding log to DFSInputStream,
> it outputs as follows:
> 2016-07-10 21:31:42,147 INFO
> [B.defaultRpcServer.handler=22,queue=1,port=16020] hdfs.DFSClient:
> DFSClient_NONMAPREDUCE_1984924661_1 seek
> DatanodeInfoWithStorage[10.130.1.29:50010,DS-086bc494-d862-470c-86e8-9cb7929985c6,DISK]
> for BP-360285305-10.130.1.11-1444619256876:blk_1109360829_35627143. pos:
> 111506876, targetPos: 111506843
> ...
> As the pos of this input stream is larger than targetPos(the pos trying to
> seek), A new connection to the datanode will be created, the older one will
> be closed as a consequence. When the wrong seeking ops are large, the
> datanode's block scanner info message is spamming logs, as well as many
> connections to the same datanode will be created.
> hadoop version: 2.7.1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)