[
https://issues.apache.org/jira/browse/HDFS-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803459#comment-13803459
]
Chris Nauroth commented on HDFS-5386:
-------------------------------------
Thanks again, Colin. Here is my feedback.
Architecture:
* Can we include a high-level diagram in the architecture section? Lifting the
diagram out of the design doc is probably sufficient.
* Mention that multiple cache directives may cover the same path, that the
highest replication setting is applied, and that uncaching does not occur
unless all directives covering the path have been removed.
* Mention that we do not cache through symlinks.
* Mention that corrupt replicas will not be cached.
* In addition to the periodic namespace scan, mention that relevant user
actions like adding/removing a directive or removing a pool automatically
trigger a scan.
Interface:
* Replace all "hdfs cacheAdmin" with "hdfs cacheadmin". The command is
case-sensitive.
* Mention that cache directives may be removed one at a time with "hdfs
cacheadmin -removeDirective" and all removed for a path with "hdfs cacheadmin
-removeDirectives".
* Let's enumerate all shell commands related to the feature, similar to the
snapshot documentation:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html
.
* Can we mention programmatic access through the API? I think it's probably
sufficient to link to the JavaDocs for {{DistributedFileSystem}} and mention
the relevant methods. No need to repeat everything here.
* I haven't reviewed the metrics patch yet, but can we fold in a list now
describing each metric we expose?
Configuration:
* When discussing the requirement for native libraries, can you please link to
the page that discusses the native libraries?
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html
* In addition to the required configuration properties, let's also discuss the
optional properties that are exposed for tuning. We can list and describe
every configuration property related to the feature:
** {{dfs.namenode.caching.enabled}}
** {{dfs.datanode.max.locked.memory}}
** {{dfs.namenode.path.based.cache.refresh.interval.ms}}
** {{dfs.datanode.fsdatasetcache.max.threads.per.volume}}
** {{dfs.cachereport.intervalMsec}}
* In the discussion of ulimit, there is a potential point of confusion in that
{{dfs.datanode.max.locked.memory}} is specified in bytes, but the output of
{{ulimit -l}} is in KB. Can we warn about the difference in units for users?
Table of Contents:
I think the hierarchy is a little off. I don't think all of those topics were
meant to go under Use Cases. I suggest the following:
* Background
* Use Cases
* Architecture
* Configuration
** Native Libraries
** Configuration Properties
** OS Limits
* Interface
** hdfs cacheadmin shell
** DistributedFileSystem API
General:
* Text containing {{<tt>}} tags seemed to be garbled in the mvn site output.
> Add feature documentation for datanode caching.
> -----------------------------------------------
>
> Key: HDFS-5386
> URL: https://issues.apache.org/jira/browse/HDFS-5386
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: documentation
> Affects Versions: HDFS-4949
> Reporter: Chris Nauroth
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-5386-caching.001.patch
>
>
> Write feature documentation for datanode caching, covering all of the
> following:
> * high-level architecture
> * OS/native code requirements
> * OS configuration (ulimit -l)
> * new configuration properties for namenode and datanode
> * cache admin CLI commands
> * pointers to API for programmatic control of caching directives
--
This message was sent by Atlassian JIRA
(v6.1#6144)