[ 
https://issues.apache.org/jira/browse/HDFS-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15559026#comment-15559026
 ] 

Kai Zheng commented on HDFS-8555:
---------------------------------

What's the benefit to build and maintain the indexes inside NameNode in HDFS 
layer instead of on top of it? You mentioned SQL over HDFS, but currently SQL 
on Hadoop is mainly supported by all kinds of frameworks (for example, Hive, 
Spark, Impala) with various data formats (with indexes) such as parquet, orc 
and carbondata, all being built on HDFS.

> Random read support on HDFS files using Indexed Namenode feature
> ----------------------------------------------------------------
>
>                 Key: HDFS-8555
>                 URL: https://issues.apache.org/jira/browse/HDFS-8555
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.5.2
>         Environment: Linux
>            Reporter: amit sehgal
>            Assignee: Afzal Saan
>   Original Estimate: 720h
>  Remaining Estimate: 720h
>
> Currently Namenode does not provide support to do random reads. With so many 
> tools built on top of HDFS solving the use case of Exploratory BI and 
> providing SQL over HDFS. The need of hour is to reduce the number of blocks 
> read for a Random read. 
> E.g. extracting say 10 lines worth of information out of 100GB files should 
> be reading only those block which can potentially have those 10 lines.
> This can be achieved by adding a tagging feature per block in name node, each 
> block written to HDFS will have tags associated to it stored in index.
> Namednode when access via the Indexing feature will use this index native to 
> reduce the no. of block returned to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to