[
https://issues.apache.org/jira/browse/HDFS-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
amit sehga updated HDFS-8555:
-----------------------------
Summary: Random read support on HDFS files using Indexed Namenode feature
(was: Random access support on HDFS files using Indexed Namenode feature)
> Random read support on HDFS files using Indexed Namenode feature
> ----------------------------------------------------------------
>
> Key: HDFS-8555
> URL: https://issues.apache.org/jira/browse/HDFS-8555
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: HDFS, hdfs-client, namenode
> Affects Versions: 2.5.2
> Environment: Linux
> Reporter: amit sehga
> Assignee: amit sehga
> Fix For: 3.0.0
>
> Original Estimate: 720h
> Remaining Estimate: 720h
>
> Currently Namenode does not provide support to do random reads. With so many
> tools built on top of HDFS solving the use case of Exploratory BI and
> providing SQL over HDFS. The need of hour is to reduce the number of blocks
> read for a Random read.
> E.g. extracting say 10 lines worth of information out of 100GB files should
> be reading only those block which can potentially have those 10 lines.
> This can be achieved by adding a tagging feature per block in name node, each
> block written to HDFS will have tags associated to it stored in index.
> Namednode when access via the Indexing feature will use this index native to
> reduce the no. of block returned to the client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)