[jira] [Commented] (HDFS-4672) Support tiered storage policies

Suresh Srinivas (JIRA) Fri, 12 Jul 2013 16:22:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707530#comment-13707530
 ]


Suresh Srinivas commented on HDFS-4672:
---------------------------------------

bq. Today the scope of HDFS-2832 was widened to duplicate this issue. Since the 
issues are linked, that was not necessary. 
I disagree. Here is the brief comment I had posted on that jira - 
https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12539644&commentId=13192326
{quote}
# Support for heterogeneous storages:
#* DN could support along with disks, other types of storage such as flash etc.
#* Suitable storage can be chosen based on client preference such as need for 
random reads etc.
# Block report scaling: instead of a single monolithic block report, a smaller 
block report per storage becomes possible. This is important with the growth in 
disk capacity and number of disks per datanode.
# Better granularity of storage failure handling:
#* DN could just indicate loss of storage and namenode can handle it better 
since it knows the list of blocks belonging to a storage. 
#* DN could locally handle storage failures or provide decommissioning of a 
storage by marking a storage as ReadOnly.
# Hot pluggability of disks/storages: adding and deleting a storage to a node 
is simplified.
# Other flexibility: includes future enhancements to balance storages with in a 
datanode, balancing the load (number of transceivers) per storage etc and 
better block placement strategies.
{quote}

It has brief mentions of the following, that is duplicated in this jira:
# Client preference for writing to storages - automatically means that block 
placement must consider storage type etc.
# Support for different storage types in datanode and block reports based on 
that.
# Awareness of those storage types at the namenode (not for just block 
placement with various other benefits)
# Affinity of replicas to a storage type.

Certainly you have elaborated along these points and more implementation 
details. Does not mean it is a different jira.
                
> Support tiered storage policies
> -------------------------------
>
>                 Key: HDFS-4672
>                 URL: https://issues.apache.org/jira/browse/HDFS-4672
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, hdfs-client, libhdfs, namenode
>            Reporter: Andrew Purtell
>
> We would like to be able to create certain files on certain storage device 
> classes (e.g. spinning media, solid state devices, RAM disk, non-volatile 
> memory). HDFS-2832 enables heterogeneous storage at the DataNode, so the 
> NameNode can gain awareness of what different storage options are available 
> in the pool and where they are located, but no API is provided for clients or 
> block placement plugins to perform device aware block placement. We would 
> like to propose a set of extensions that also have broad applicability to use 
> cases where storage device affinity is important:
>  
> - Add an enum of generic storage device classes, borrowing from current 
> taxonomy of the storage industry
>  
> - Augment DataNode volume metadata in storage reports with this enum
>  
> - Extend the namespace so pluggable block policies can be specified on a 
> directory and storage device class can be tracked in the Inode. Perhaps this 
> could be a larger discussion on adding support for extended attributes in the 
> HDFS namespace. The Inode should track both the storage device class hint and 
> the current actual storage device class. FileStatus should expose this 
> information (or xattrs in general) to clients.
>  
> - Extend the pluggable block policy framework so policies can also consider, 
> and specify, affinity for a particular storage device class
>  
> - Extend the file creation API to accept a storage device class affinity 
> hint. Such a hint can be supplied directly as a parameter, or, if we are 
> considering extended attribute support, then instead as one of a set of 
> xattrs. The hint would be stored in the namespace and also used by the client 
> to indicate to the NameNode/block placement policy/DataNode constraints on 
> block placement. Furthermore, if xattrs or device storage class affinity 
> hints are associated with directories, then the NameNode should provide the 
> storage device affinity hint to the client in the create API response, so the 
> client can provide the appropriate hint to DataNodes when writing new blocks.
>  
> - The list of candidate DataNodes for new blocks supplied by the NameNode to 
> clients should be weighted/sorted by availability of the desired storage 
> device class. 
>  
> - Block replication should consider storage device affinity hints. If a 
> client move()s a file from a location under a path with affinity hint X to 
> under a path with affinity hint Y, then all blocks currently residing on 
> media X should be eventually replicated onto media Y with the then excess 
> replicas on media X deleted.
>  
> - Introduce the concept of degraded path: a path can be degraded if a block 
> placement policy is forced to abandon a constraint in order to persist the 
> block, when there may not be available space on the desired device class, or 
> to maintain the minimum necessary replication factor. This concept is 
> distinct from the corrupt path, where one or more blocks are missing. Paths 
> in degraded state should be periodically reevaluated for re-replication.
>  
> - The FSShell should be extended with commands for changing the storage 
> device class hint for a directory or file. 
>  
> - Clients like DistCP which compare metadata should be extended to be aware 
> of the storage device class hint. For DistCP specifically, there should be an 
> option to ignore the storage device class hints, enabled by default.
>  
> Suggested semantics:
>  
> - The default storage device class should be the null class, or simply the 
> “default class”, for all cases where a hint is not available. This should be 
> configurable. hdfs-defaults.xml could provide the default as spinning media.
>  
> - A storage device class hint should be provided (and is necessary) only when 
> the default is not sufficient.
>  
> - For backwards compatibility, any FSImage or edit log entry lacking a  
> storage device class hint is interpreted as having affinity for the null 
> class.
>  
> - All blocks for a given file share the same storage device class. If the 
> replication factor for this file is increased the replicas should all be 
> placed on the same storage device class.
>  
> - If one or more blocks for a given file cannot be placed on the required 
> device class, then the file is marked as degraded. Files in degraded state 
> should be periodically reevaluated for re-replication. 
>  
> - A directory and path can only have one storage device affinity hint. If the 
> file inode specifies a hint, this is used, otherwise we walk up the path 
> until a hint is found and use that one, otherwise the default storage class 
> is used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4672) Support tiered storage policies

Reply via email to