[ 
https://issues.apache.org/jira/browse/HDFS-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583996#comment-14583996
 ] 

Xiaoyu Yao commented on HDFS-7164:
----------------------------------

Thanks [~arpitagarwal] for adding document for this. That will be very helpful 
for people to understand and use this feature. 
Patch looks good to me overall. In addition to [~brahmareddy]'s comments, I 
just have a few minor issues.

*CentralizedCacheManagement.md*
{code}
"The DataNodes" -> The DataNode
{code}

*MemoryStorage.md*
{code}
1. "Rare data loss is possible in the event of a node restart or a network 
partition."
-> Rare data loss is possible in the event of a data node restart *before 
replicas are persisted to disk*. 
"network partition" in here is not very clear, maybe we can remove it.

2. "Applications that use Lazy Persist writes"
-> "Lazy Persist Writes" should be used consistently.

3. "Applications that use Lazy Persist writes will work when there is 
insufficient memory or when the feature is not configured by the administrator. 
HDFS will transparently use hard disk storage for writes when memory is 
unavailable or not configured."
-> Can you elaborate "when the feature is not configured by the administrator"? 
Or we can say, this feature will continue to work by automatically fallback to 
DISK storage for write if memory is insufficient, unavailable or not 
configured. 

4. ",the "locked-in-memory size" ulimit (`ulimit -l`) of the Data Node user 
also needs to be increased to match this parameter (see the related section on 
[OS Limits]"
->"Data Node user", do you mean hdfs super user or the Data Node?

5. "Using more than one `tmpfs` partition per Data Node for Lazy Persist writes 
is not recommended."
-> Lazy Persist Writes

6. "This step is crucial. You will lose data if a `tmpfs` mount is not 
correctly tagged as RAM_DISK."
-> Can you elaborate more on this? For example, "Without RAM_DISK tag, the 
volatile storage will be treated as DISK without lazy persist. Data will be 
lost upon Data Node restart."
{code}

I tried to apply the patch and check the rendered new sites but git apply 
--binary always failed. Do I miss anything?

$ git apply --binary HDFS-7164.04.patch 
error: cannot apply binary patch to 
'hadoop-hdfs-project/hadoop-hdfs/src/site/resources/images/LazyPersistWrites.png'
 without full index line
error: 
hadoop-hdfs-project/hadoop-hdfs/src/site/resources/images/LazyPersistWrites.png:
 patch does not apply


> Feature documentation for HDFS-6581
> -----------------------------------
>
>                 Key: HDFS-7164
>                 URL: https://issues.apache.org/jira/browse/HDFS-7164
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: documentation
>    Affects Versions: 2.7.0, HDFS-6581
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-7164.01.patch, HDFS-7164.02.patch, 
> HDFS-7164.03.patch, HDFS-7164.04.patch, LazyPersistWrites.png
>
>
> Add feature documentation explaining use cases, how to configure RAM_DISK and 
> API updates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to