[ 
https://issues.apache.org/jira/browse/HBASE-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383565#comment-15383565
 ] 

Zhihua Deng edited comment on HBASE-16212 at 7/19/16 6:24 AM:
--------------------------------------------------------------

Changing from threadlocal to synchronization, yes there will be a potential 
synchronization bottleneck, but it better than io operation.  So the question 
here is that how often the connection will be recreated for seeking + reading? 
The original threadlocal is declared as non static private field here,  it 
means that the created fsreaderimpl instance will be reused later on, also an 
inputstream is initiated when fsreaderimpl created. 
Taken the case described in the attached log, The synchronization way is more 
better than threadlocal when acts as a sequential read .
How about concurrent case?  the worst case: Thread1.readBlockInternal -> 
Thread2.readBlockInternal -> Thread3.readBlockInternal -> 
Thread1.readBlockInternal -> ....
In this case, the synchronization way is equal to threadlocal when taken how 
many connections will be created into consideration.





was (Author: dengzh):
Changing from threadlocal to a common field, yes there will be a potential 
synchronization bottleneck, but it better than io operation.  So the question 
here is that how often the connection will be recreated for seeking + reading? 
The original threadlocal is declared as non static private field here,  it 
means that the created fsreaderimpl instance will be reused later on, also a 
inputstream is initiated when fsreaderimpl created. 
Taken the case described in the attached log, The synchronization way is more 
better than threadlocal when acts as a sequential read .
How about concurrent case?  the worst case: Thread1.readBlockInternal -> 
Thread2.readBlockInternal -> Thread3.readBlockInternal -> 
Thread1.readBlockInternal -> ....
In this case, the synchronization way is equal to threadlocal when taken how 
many connections will be created into consideration.




> Many connections to datanode are created when doing a large scan 
> -----------------------------------------------------------------
>
>                 Key: HBASE-16212
>                 URL: https://issues.apache.org/jira/browse/HBASE-16212
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.1.2
>            Reporter: Zhihua Deng
>         Attachments: HBASE-16212.patch, HBASE-16212.v2.patch, 
> regionserver-dfsinputstream.log
>
>
> As described in https://issues.apache.org/jira/browse/HDFS-8659, the datanode 
> is suffering from logging the same repeatedly. Adding log to DFSInputStream, 
> it outputs as follows:
> 2016-07-10 21:31:42,147 INFO  
> [B.defaultRpcServer.handler=22,queue=1,port=16020] hdfs.DFSClient: 
> DFSClient_NONMAPREDUCE_1984924661_1 seek 
> DatanodeInfoWithStorage[10.130.1.29:50010,DS-086bc494-d862-470c-86e8-9cb7929985c6,DISK]
>  for BP-360285305-10.130.1.11-1444619256876:blk_1109360829_35627143. pos: 
> 111506876, targetPos: 111506843
>  ...
> As the pos of this input stream is larger than targetPos(the pos trying to 
> seek), A new connection to the datanode will be created, the older one will 
> be closed as a consequence. When the wrong seeking ops are large, the 
> datanode's block scanner info message is spamming logs, as well as many 
> connections to the same datanode will be created.
> hadoop version: 2.7.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to