[
https://issues.apache.org/jira/browse/HDFS-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357118#comment-17357118
]
Feilong He edited comment on HDFS-15714 at 6/4/21, 7:57 AM:
------------------------------------------------------------
Hi [~bpatel], sorry for this late reply.
The relevant code path is shown as below.
{code:java}
ReadMountManager: FSMountAttrOp.addRemotePaths -> FSMountAttrOp: w.addToEdits
-> MountEditLogWriter: createFile{code}
{{In MountEditLogWriter#createFile}}, we can know a {{HdfsFileStatus}} will be
created based on {{remoteStatus}} obtained from remote storage, which is like
creating a normal HDFS file except that the data is stored outside HDFS.
*Actually, remote file's own modification time is not used and kept in HDFS*.
My previous reply may be ambiguous.
I just did a simple test to verify it: compare a file(object)'s modification
time in S3 and that in HDFS after S3 bucket containing that file is mounted to
HDFS. The phenomenon is they are different, which is consistent with the code
analysis. The modification time of that file in HDFS is the time HDFS generates
when responding to user's mount request.
For {{readOnly}} mount mode, mounted data cannot be changed from HDFS side. So
its modification time keeps unchanged on HDFS. It is as same as create time.
I think, generally, many upper HDFS applications don't care about data
modification time. So the inconsistency of modification time may not cause
issues. If you have any thought or case I ignored, please kindly point out it.
Thanks a lot for your comment! And as always, any discussion is welcome!
was (Author: philohe):
Hi [~bpatel], sorry for this late reply.
The relevant code path is shown as below.
{code:java}
ReadMountManager: FSMountAttrOp.addRemotePaths -> FSMountAttrOp: w.addToEdits
-> MountEditLogWriter: createFile{code}
{{In MountEditLogWriter#createFile}}, we can know a {{HdfsFileStatus}} will be
created based on {{remoteStatus}} obtained from remote storage, which is like
creating a normal HDFS file except that the data is stored outside HDFS.
*Actually, modification time of remote file is not used and kept in HDFS*. My
previous reply may be ambiguous.
I just did a simple test to verify it: compare a file(object)'s modification
time in S3 and that in HDFS after S3 bucket containing that file is mounted to
HDFS. The phenomenon is they are different, which is consistent with the code
analysis. The modification time of that file in HDFS is the time when the above
{{#createFile}} is triggered to respond to user's mount request.
For {{readOnly}} mount mode, mounted data cannot be changed from HDFS side. So
its modification time keeps unchanged on HDFS.
I think, generally, upper HDFS applications don't care about data modification
time. So the inconsistency of modification time may not cause issues. If you
have any thought or case I ignored, please kindly point out it.
Thanks a lot for your comment! And as always, any discussion is welcome!
> HDFS Provided Storage Read/Write Mount Support On-the-fly
> ---------------------------------------------------------
>
> Key: HDFS-15714
> URL: https://issues.apache.org/jira/browse/HDFS-15714
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
> Affects Versions: 3.4.0
> Reporter: Feilong He
> Assignee: Feilong He
> Priority: Major
> Labels: pull-request-available
> Attachments: HDFS-15714-01.patch,
> HDFS_Provided_Storage_Design-V1.pdf, HDFS_Provided_Storage_Performance-V1.pdf
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> HDFS Provided Storage (PS) is a feature to tier HDFS over other file systems.
> In HDFS-9806, PROVIDED storage type was introduced to HDFS. Through
> configuring external storage with PROVIDED tag for DataNode, user can enable
> application to access data stored externally from HDFS side. However, there
> are two issues need to be addressed. Firstly, mounting external storage
> on-the-fly, namely dynamic mount, is lacking. It is necessary to get it
> supported to flexibly combine HDFS with an external storage at runtime.
> Secondly, PS write is not supported by current HDFS. But in real
> applications, it is common to transfer data bi-directionally for read/write
> between HDFS and external storage.
> Through this JIRA, we are presenting our work for PS write support and
> dynamic mount support for both read & write. Please note in the community
> several JIRAs have been filed for these topics. Our work is based on these
> previous community work, with new design & implementation to support called
> writeBack mount and enable admin to add any mount on-the-fly. We appreciate
> those folks in the community for their great contribution! See their pending
> JIRAs: HDFS-14805 & HDFS-12090.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]