amit sehga created HDFS-8555:
--------------------------------
Summary: Random access support on HDFS files using Indexed
Namenode feature
Key: HDFS-8555
URL: https://issues.apache.org/jira/browse/HDFS-8555
Project: Hadoop HDFS
Issue Type: Improvement
Components: HDFS, hdfs-client, namenode
Affects Versions: 2.5.2
Environment: Linux
Reporter: amit sehga
Assignee: amit sehga
Fix For: 3.0.0
Currently Namenode does not provide support to do random reads. With so many
tools built on top of HDFS solving the use case of Exploratory BI and providing
SQL over HDFS. The need of hour is to reduce the number of blocks read for a
Random read.
E.g. extracting say 10 lines worth of information out of 100GB files should be
reading only those block which can potentially have those 10 lines.
This can be achieved by adding a tagging feature per block in name node, each
block written to HDFS will have tags associated to it stored in index.
Namednode when access via the Indexing feature will use this index native to
reduce the no. of block returned to the client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)