[ 
https://issues.apache.org/jira/browse/HDFS-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047286#comment-15047286
 ] 

Mingliang Liu commented on HDFS-8555:
-------------------------------------

Would you kindly explain more by "... only those blocks which belong to 
Nicholas out of a given large block."?
And in the _Description_ example, why seek and position read are not able to 
return "those blocks that potentially have those 10 lines?"
Thanks.

> Random read support on HDFS files using Indexed Namenode feature
> ----------------------------------------------------------------
>
>                 Key: HDFS-8555
>                 URL: https://issues.apache.org/jira/browse/HDFS-8555
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.5.2
>         Environment: Linux
>            Reporter: amit sehgal
>            Assignee: amit sehgal
>             Fix For: 3.0.0
>
>   Original Estimate: 720h
>  Remaining Estimate: 720h
>
> Currently Namenode does not provide support to do random reads. With so many 
> tools built on top of HDFS solving the use case of Exploratory BI and 
> providing SQL over HDFS. The need of hour is to reduce the number of blocks 
> read for a Random read. 
> E.g. extracting say 10 lines worth of information out of 100GB files should 
> be reading only those block which can potentially have those 10 lines.
> This can be achieved by adding a tagging feature per block in name node, each 
> block written to HDFS will have tags associated to it stored in index.
> Namednode when access via the Indexing feature will use this index native to 
> reduce the no. of block returned to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to