[
https://issues.apache.org/jira/browse/HDFS-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387507#comment-16387507
]
Rakesh R commented on HDFS-13209:
---------------------------------
{quote}However, sometime, we might need to keep all files in the same directory
(consistency constraint) but might want some of them on SSD (small, in my case)
until they are processed and merger/removed. Then they will go on the default
policy.
{quote}
User can sets StoragePolicy to either a directory or a file,
[fs#setStoragePolicy|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#setStoragePolicy(org.apache.hadoop.fs.Path,%20java.lang.String)].
I agree with you, presently there is no option to pass storage policy during a
file creation and newly created file inherits the storage policy from its
parent directory and continue writing blocks using this storage policy. I'm not
against this new API proposal, but I could see this behavior could be achieved
with an additional cost of FileSystem API call.
How about changing storage policy on a file, before writing contents to it. I'm
trying an attempt to describe the steps, please go through and let me know if I
missed anything.
{code:java}
Step-1) Assume parent directory "/myparent" configured with ALL_SSD policy.
Step-2) Now, creates a file "/myparent/myfile" under "/myparent" dir. It
inherits ALL_SSD policy from its parent.
Step-3) Change storage policy of "/myparent/myfile" to "COLD" storage policy,
which uses ARCHIVE storage type.
Step-4) Writes data to the file. Here, the data blocks will be written to
ARCHIVE storage types.
{code}
{code:java}
Sample Code:-
String fileName = "/myparent/myfile";
final FSDataOutputStream out = dfs.create(new Path(fileName),
replicatonFactor);
dfs.setStoragePolicy(new Path(fileName), "COLD");
for (int i = 0; i < 1024; i++) {
out.write(i);
}
out.close();
{code}
> DistributedFileSystem.create should allow an option to provide StoragePolicy
> ----------------------------------------------------------------------------
>
> Key: HDFS-13209
> URL: https://issues.apache.org/jira/browse/HDFS-13209
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: hdfs
> Affects Versions: 3.0.0
> Reporter: Jean-Marc Spaggiari
> Priority: Major
>
> DistributedFileSystem.create allows to get a FSDataOutputStream. The stored
> file and related blocks will used the directory based StoragePolicy.
>
> However, sometime, we might need to keep all files in the same directory
> (consistency constraint) but might want some of them on SSD (small, in my
> case) until they are processed and merger/removed. Then they will go on the
> default policy.
>
> When creating a file, it will be useful to have an option to specify a
> different StoragePolicy...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]