Anoop Sam John commented on HDFS-10285:

HBase can get benefited from this feature. The scenario is as below
HBase allow the WAL files to be kept in low latency devices using the HSM 
feature.  (ALL_SSD/ ONE_SSD etc)  There is a directory for keeping all active 
WALs and we config the policy for that. After certain time, the WAL file will 
become inactive as all the data in that is eventually getting flushed into 
HFiles.  We will then archive it.  There is an archive directory and the 
archive op is done via a rename to a file under the archive dir.  Obviously the 
archive dir won't have any policy configured. By default we will keep the WAL 
files under archive dir for some more min and then delete them. If the WAL can 
get deleted it is fine even if the blocks of the WAL files continue to be in 
low latency device.  But there are some features and scenarios under which the 
deletion of WAL from archive can get delayed. Few eg:s
Cross cluster replication in place and the peer replica is slow/down.  HBase do 
inter cluster replication by reading the WAL. As long as the WAL cells are read 
and passed to other cluster, we can not delete
Backup feature in use and the backup refers to WAL files (Snapshot feature also)
Incremental backup is enabled.  Unless an incremental backup is taken, WALs in 
that time range can not be deleted.
Same for HFiles. After compaction, the compacted away files are archived and if 
they are referred by some active snapshots, we may not be able to delete them 
So it makes all sense to make use of this feature for moving the File blocks 
out of low latency devices so as to free space in it.
Once this feature is GA in a version and we can open up jira to make use of it.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to