[
https://issues.apache.org/jira/browse/HDFS-15829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yang Yun updated HDFS-15829:
----------------------------
Description:
h3. Overview
HDFS TTL is implemented using the xattr mechanism provided by HDFS. When a
user sets a TTL to a file or directory, HDFS creates an xattr named "ttl" for
the file or directory, and stores the value set by the user in this xattr. A
service called TtlService runs on HDFS standby or Observer(Recommended ). It
scans the in-memony inode map regularly, reads the value of xattr "ttl" from
each INode, and calculates whether the ttl has expired. If so, it will get the
full file path from Inode and add it to expired file list. After scan it will
create a DFSClient and delete the expired file list in bach. other option is to
trigger a Yarn job to delete them in parallel。
h3. Protocol
Add two xattr
"user.ttl": value of TTL by minutes, identify the time that file or folder
will be expired.
"user. ttlproperty": value is TTL types, including,
* SINCELASTWRITE = 0x1 # caculate the TTL from last writing.
* KEEPEMPTYDIR = 0x2; # if keep the empty dir
* KEEPEMPTYSUBDIR = 0x4; # if keep subdir empty.
*Nested TTL*
TTL supports setting for each directory and file on a path, so that after
setting, the setting of the lower-level subdirectory or file will take effect.
If a directory or file does not have a time to live, it will inherit the
settings of the nearest ancestor directory. The following is an illustrative
example. Suppose there is such a directory tree:
{code:java}
/A/B/E
/A/C
/A/D {code}
That is, B, C and D under directory A. And there is file E under directory B.
Suppose the user sets the TTL of A to 2 days, the TTL of B to 3 days, the TTL
of E to 1 day, and the TTL of C and D is not set. Then the file E will be
cleared after 1 day. After 2 days, C and D will be cleared. The settings
inherited from directory A are used here. Please note that at this time,
directory A will not be cleared because it is not empty. After 3 days, B will
be cleared because its own settings expire. After B is cleared, because A’s
settings have already expired and A has become an empty directory, it will also
be cleared.
h3. Usage
Fro the first version, provide API to set the TTL, will add comand line later.
{code:java}
/**
* Set TTL to a file.
* @param fs the file system.
* @param path the target file to set TTL.
* @param path the TTL value.
* @param property the type of TTL.
* @throws IOException
*/
public static void setTTl(FileSystem fs, Path path, int value, int property)
{code}
h3. Example
{code:java}
TtlInfo.setTTl(fs, file, System.currentTimeMillis() / 1000 / 60 + 60, 0); #The
file will be expired in an 60 minutes.
TtlInfo.setTTl(fs, file, 60, TtlInfo.SINCELASTWRITE); #The file will be expired
after 60 minutes since last write.{code}
was:
h3. Overview
HDFS TTL is implemented using the xattr mechanism provided by HDFS. When a
user sets a TTL to a file or directory, HDFS creates an xattr named "ttl" for
the file or directory, and stores the value set by the user in this xattr. A
service called TtlService runs on HDFS standby or Observer(Recommended ). It
scans the in-memony inode map regularly, reads the value of xattr "ttl" from
each INode, and calculates whether the ttl has expired. If so, it will get the
full file path from Inode and add it to expired file list. After scan it will
create a DFSClient and delete the expired file list in bach. other option is to
trigger a Yarn job to delete them in parallel。
h3. Protocol
Add two xattr
"user.ttl": value of TTL by minutes, identify the time that file or folder
will be expired.
"user. ttlproperty": value is TTL types, including,
* SINCELASTWRITE = 0x1 # caculate the TTL from last writing.
* KEEPEMPTYDIR = 0x2; # if keep the empty dir
* KEEPEMPTYSUBDIR = 0x4; # if keep subdir empty.
*Nested TTL*
TTL supports setting for each directory and file on a path, so that after
setting, the setting of the lower-level subdirectory or file will take effect.
If a directory or file does not have a time to live, it will inherit the
settings of the nearest ancestor directory. The following is an illustrative
example. Suppose there is such a directory tree:
{code:java}
/A/B/E
/A/C
/A/D {code}
That is, B, C and D under directory A. And there is file E under directory B.
Suppose the user sets the TTL of A to 2 days, the TTL of B to 3 days, the TTL
of E to 1 day, and the TTL of C and D is not set. Then the file E will be
cleared after 1 day. After 2 days, C and D will be cleared. The settings
inherited from directory A are used here. Please note that at this time,
directory A will not be cleared because it is not empty. After 3 days, B will
be cleared because its own settings expire. After B is cleared, because A’s
settings have already expired and A has become an empty directory, it will also
be cleared.
h3. Usage
Fro the first version, provide API to set the TTL, will add comand line later.
{code:java}
/**
* Set TTL to a file.
* @param fs the file system.
* @param path the target file to set TTL.
* @param path the TTL value.
* @param property the type of TTL.
* @throws IOException
*/
public static void setTTl(FileSystem fs, Path path, int value, int property)
{code}
h3. Example
{code:java}
TtlInfo.setTTl(fs, file, System.currentTimeMillis() / 1000 / 60 + 60, 0); #The
file will be expired in an 60 minutes.
TtlInfo.setTTl(fs, file, 60, TtlInfo.SINCELASTWRITE); #The file will be expired
after 60 minutes since last write.{code}
> Use xattr to support HDFS TTL on Observer namenode
> --------------------------------------------------
>
> Key: HDFS-15829
> URL: https://issues.apache.org/jira/browse/HDFS-15829
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: dfsclient, namenode
> Reporter: Yang Yun
> Assignee: Yang Yun
> Priority: Minor
> Attachments: HDFS-15829.patch
>
>
> h3. Overview
>
> HDFS TTL is implemented using the xattr mechanism provided by HDFS. When a
> user sets a TTL to a file or directory, HDFS creates an xattr named "ttl" for
> the file or directory, and stores the value set by the user in this xattr. A
> service called TtlService runs on HDFS standby or Observer(Recommended ). It
> scans the in-memony inode map regularly, reads the value of xattr "ttl" from
> each INode, and calculates whether the ttl has expired. If so, it will get
> the full file path from Inode and add it to expired file list. After scan it
> will create a DFSClient and delete the expired file list in bach. other
> option is to trigger a Yarn job to delete them in parallel。
> h3. Protocol
> Add two xattr
> "user.ttl": value of TTL by minutes, identify the time that file or folder
> will be expired.
> "user. ttlproperty": value is TTL types, including,
> * SINCELASTWRITE = 0x1 # caculate the TTL from last writing.
> * KEEPEMPTYDIR = 0x2; # if keep the empty dir
> * KEEPEMPTYSUBDIR = 0x4; # if keep subdir empty.
>
> *Nested TTL*
> TTL supports setting for each directory and file on a path, so that after
> setting, the setting of the lower-level subdirectory or file will take
> effect. If a directory or file does not have a time to live, it will inherit
> the settings of the nearest ancestor directory. The following is an
> illustrative example. Suppose there is such a directory tree:
>
> {code:java}
> /A/B/E
> /A/C
> /A/D {code}
>
> That is, B, C and D under directory A. And there is file E under directory
> B. Suppose the user sets the TTL of A to 2 days, the TTL of B to 3 days, the
> TTL of E to 1 day, and the TTL of C and D is not set. Then the file E will be
> cleared after 1 day. After 2 days, C and D will be cleared. The settings
> inherited from directory A are used here. Please note that at this time,
> directory A will not be cleared because it is not empty. After 3 days, B will
> be cleared because its own settings expire. After B is cleared, because A’s
> settings have already expired and A has become an empty directory, it will
> also be cleared.
> h3. Usage
> Fro the first version, provide API to set the TTL, will add comand line
> later.
>
> {code:java}
> /**
> * Set TTL to a file.
> * @param fs the file system.
> * @param path the target file to set TTL.
> * @param path the TTL value.
> * @param property the type of TTL.
> * @throws IOException
> */
> public static void setTTl(FileSystem fs, Path path, int value, int property)
> {code}
> h3. Example
>
> {code:java}
> TtlInfo.setTTl(fs, file, System.currentTimeMillis() / 1000 / 60 + 60, 0);
> #The file will be expired in an 60 minutes.
> TtlInfo.setTTl(fs, file, 60, TtlInfo.SINCELASTWRITE); #The file will be
> expired after 60 minutes since last write.{code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]