[ 
https://issues.apache.org/jira/browse/HDFS-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Alamoudi updated HDFS-6607:
------------------------------------

    Priority: Major  (was: Minor)

> DFSInputStream Seek performance improvement
> -------------------------------------------
>
>                 Key: HDFS-6607
>                 URL: https://issues.apache.org/jira/browse/HDFS-6607
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, performance
>    Affects Versions: 2.4.1
>            Reporter: Abdullah Alamoudi
>
> When having a DFSInputStream open and seeking to a position that resides in 
> the same block, if the target position is in the TCP buffer already, the seek 
> is performed efficiently simply by eating up the intervening data. See line 
> 1368 in the file: 
> hadoop-common/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java.
> However, if the position is in the same block but after the TCP buffer, the 
> inputstream performs a set of actions including closing the current block 
> reader, locating the block again, selecting a data node and creating a new 
> block reader. During this, many objects are created and all of this is very 
> inefficient for users with random access needs (e.g index access).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to