[ 
https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861940#comment-13861940
 ] 

Sanjay Radia commented on HDFS-5389:
------------------------------------

Lin will shortly post a link to her prototype code on GitHub. Her prototype is 
based on HDFS 0.23.

> A Namenode that keeps only a part of the namespace in memory
> ------------------------------------------------------------
>
>                 Key: HDFS-5389
>                 URL: https://issues.apache.org/jira/browse/HDFS-5389
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 0.23.1
>            Reporter: Lin Xiao
>            Priority: Minor
>
> *Background:*
> Currently, the NN Keeps all its namespace in memory. This has had the benefit 
> that the NN code is very simple and, more importantly, helps the NN scale to 
> over 4.5K machines with 60K  to 100K concurrently tasks.  HDFS namespace can 
> be scaled currently using more Ram on the NN and/or using Federation which 
> scales both namespace and performance. The current federation implementation 
> does not allow renames across volumes without data copying but there are 
> proposals to remove that limitation.
> *Motivation:*
>  Hadoop lets customers store huge amounts of data at very economical prices 
> and hence allows customers to store their data for several years. While most 
> customers perform analytics on recent  data (last hour, day, week, months, 
> quarter, year), the ability to have five year old data online for analytics 
> is very attractive for many businesses. Although one can use larger RAM in a 
> NN and/or use Federation, it not really necessary to store the entire 
> namespace in memory since only the recent data is typically heavily accessed. 
> *Proposed Solution:*
> Store a portion of the NN's namespace in memory- the "working set" of the 
> applications that are currently operating. LSM data structures are quite 
> appropriate for maintaining the full namespace in memory. One choice is 
> Google's LevelDB open-source implementation.
> *Benefits:*
>  *  Store larger namespaces without resorting to Federated namespace volumes.
>  * Complementary to NN Federated namespace volumes,  indeed will allow a 
> single NN to easily store multiple larger volumes.
>  *  Faster cold startup - the NN does not have read its full namespace before 
> responding to clients.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to