[
https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sanjay Radia updated HDFS-5389:
-------------------------------
Issue Type: Sub-task (was: Improvement)
Parent: HDFS-2362
> A Namenode that keeps only a part of the namespace in memory
> ------------------------------------------------------------
>
> Key: HDFS-5389
> URL: https://issues.apache.org/jira/browse/HDFS-5389
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 0.23.1
> Reporter: Lin Xiao
> Priority: Minor
>
> *Background:*
> Currently, the NN Keeps all its namespace in memory. This has had the benefit
> that the NN code is very simple and, more importantly, helps the NN scale to
> over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can
> be scaled currently using more Ram on the NN and/or using Federation which
> scales both namespace and performance. The current federation implementation
> does not allow renames across volumes without data copying but there are
> proposals to remove that limitation.
> *Motivation:*
> Hadoop lets customers store huge amounts of data at very economical prices
> and hence allows customers to store their data for several years. While most
> customers perform analytics on recent data (last hour, day, week, months,
> quarter, year), the ability to have five year old data online for analytics
> is very attractive for many businesses. Although one can use larger RAM in a
> NN and/or use Federation, it not really necessary to store the entire
> namespace in memory since only the recent data is typically heavily accessed.
> *Proposed Solution:*
> Store a portion of the NN's namespace in memory- the "working set" of the
> applications that are currently operating. LSM data structures are quite
> appropriate for maintaining the full namespace in memory. One choice is
> Google's LevelDB open-source implementation.
> *Benefits:*
> * Store larger namespaces without resorting to Federated namespace volumes.
> * Complementary to NN Federated namespace volumes, indeed will allow a
> single NN to easily store multiple larger volumes.
> * Faster cold startup - the NN does not have read its full namespace before
> responding to clients.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)