[ 
https://issues.apache.org/jira/browse/HDFS-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803459#comment-13803459
 ] 

Chris Nauroth commented on HDFS-5386:
-------------------------------------

Thanks again, Colin.  Here is my feedback.

Architecture:
* Can we include a high-level diagram in the architecture section?  Lifting the 
diagram out of the design doc is probably sufficient.
* Mention that multiple cache directives may cover the same path, that the 
highest replication setting is applied, and that uncaching does not occur 
unless all directives covering the path have been removed.
* Mention that we do not cache through symlinks.
* Mention that corrupt replicas will not be cached.
* In addition to the periodic namespace scan, mention that relevant user 
actions like adding/removing a directive or removing a pool automatically 
trigger a scan.

Interface:
* Replace all "hdfs cacheAdmin" with "hdfs cacheadmin".  The command is 
case-sensitive.
* Mention that cache directives may be removed one at a time with "hdfs 
cacheadmin -removeDirective" and all removed for a path with "hdfs cacheadmin 
-removeDirectives".
* Let's enumerate all shell commands related to the feature, similar to the 
snapshot documentation: 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html
 .
* Can we mention programmatic access through the API?  I think it's probably 
sufficient to link to the JavaDocs for {{DistributedFileSystem}} and mention 
the relevant methods.  No need to repeat everything here.
* I haven't reviewed the metrics patch yet, but can we fold in a list now 
describing each metric we expose?

Configuration:
* When discussing the requirement for native libraries, can you please link to 
the page that discusses the native libraries? 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html
* In addition to the required configuration properties, let's also discuss the 
optional properties that are exposed for tuning.  We can list and describe 
every configuration property related to the feature:
** {{dfs.namenode.caching.enabled}}
** {{dfs.datanode.max.locked.memory}}
** {{dfs.namenode.path.based.cache.refresh.interval.ms}}
** {{dfs.datanode.fsdatasetcache.max.threads.per.volume}}
** {{dfs.cachereport.intervalMsec}}
* In the discussion of ulimit, there is a potential point of confusion in that 
{{dfs.datanode.max.locked.memory}} is specified in bytes, but the output of 
{{ulimit -l}} is in KB.  Can we warn about the difference in units for users?

Table of Contents:
I think the hierarchy is a little off.  I don't think all of those topics were 
meant to go under Use Cases.  I suggest the following:
* Background
* Use Cases
* Architecture
* Configuration
** Native Libraries
** Configuration Properties
** OS Limits
* Interface
** hdfs cacheadmin shell
** DistributedFileSystem API

General:
* Text containing {{<tt>}} tags seemed to be garbled in the mvn site output.


> Add feature documentation for datanode caching.
> -----------------------------------------------
>
>                 Key: HDFS-5386
>                 URL: https://issues.apache.org/jira/browse/HDFS-5386
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: documentation
>    Affects Versions: HDFS-4949
>            Reporter: Chris Nauroth
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5386-caching.001.patch
>
>
> Write feature documentation for datanode caching, covering all of the 
> following:
> * high-level architecture
> * OS/native code requirements
> * OS configuration (ulimit -l)
> * new configuration properties for namenode and datanode
> * cache admin CLI commands
> * pointers to API for programmatic control of caching directives



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to