Abdullah Alamoudi created HDFS-6607:
---------------------------------------
Summary: DFSInputStream Seek performance improvement
Key: HDFS-6607
URL: https://issues.apache.org/jira/browse/HDFS-6607
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client, performance
Affects Versions: 2.4.1
Reporter: Abdullah Alamoudi
Priority: Minor
When having a DFSInputStream open and seeking to a position that resides in the
same block, if the target position is in the TCP buffer already, the seek is
performed efficiently simply by eating up the intervening data. See line 1368
in the file:
hadoop-common/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java.
However, if the position is in the same block but after the TCP buffer, the
inputstream performs a set of actions including closing the current block
reader, locating the block again, selecting a data node and creating a new
block reader. During this, many objects are created and all of this is very
inefficient for users with random access needs (e.g index access).
--
This message was sent by Atlassian JIRA
(v6.2#6252)