[
https://issues.apache.org/jira/browse/HDFS-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707530#comment-13707530
]
Suresh Srinivas commented on HDFS-4672:
---------------------------------------
bq. Today the scope of HDFS-2832 was widened to duplicate this issue. Since the
issues are linked, that was not necessary.
I disagree. Here is the brief comment I had posted on that jira -
https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12539644&commentId=13192326
{quote}
# Support for heterogeneous storages:
#* DN could support along with disks, other types of storage such as flash etc.
#* Suitable storage can be chosen based on client preference such as need for
random reads etc.
# Block report scaling: instead of a single monolithic block report, a smaller
block report per storage becomes possible. This is important with the growth in
disk capacity and number of disks per datanode.
# Better granularity of storage failure handling:
#* DN could just indicate loss of storage and namenode can handle it better
since it knows the list of blocks belonging to a storage.
#* DN could locally handle storage failures or provide decommissioning of a
storage by marking a storage as ReadOnly.
# Hot pluggability of disks/storages: adding and deleting a storage to a node
is simplified.
# Other flexibility: includes future enhancements to balance storages with in a
datanode, balancing the load (number of transceivers) per storage etc and
better block placement strategies.
{quote}
It has brief mentions of the following, that is duplicated in this jira:
# Client preference for writing to storages - automatically means that block
placement must consider storage type etc.
# Support for different storage types in datanode and block reports based on
that.
# Awareness of those storage types at the namenode (not for just block
placement with various other benefits)
# Affinity of replicas to a storage type.
Certainly you have elaborated along these points and more implementation
details. Does not mean it is a different jira.
> Support tiered storage policies
> -------------------------------
>
> Key: HDFS-4672
> URL: https://issues.apache.org/jira/browse/HDFS-4672
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, hdfs-client, libhdfs, namenode
> Reporter: Andrew Purtell
>
> We would like to be able to create certain files on certain storage device
> classes (e.g. spinning media, solid state devices, RAM disk, non-volatile
> memory). HDFS-2832 enables heterogeneous storage at the DataNode, so the
> NameNode can gain awareness of what different storage options are available
> in the pool and where they are located, but no API is provided for clients or
> block placement plugins to perform device aware block placement. We would
> like to propose a set of extensions that also have broad applicability to use
> cases where storage device affinity is important:
>
> - Add an enum of generic storage device classes, borrowing from current
> taxonomy of the storage industry
>
> - Augment DataNode volume metadata in storage reports with this enum
>
> - Extend the namespace so pluggable block policies can be specified on a
> directory and storage device class can be tracked in the Inode. Perhaps this
> could be a larger discussion on adding support for extended attributes in the
> HDFS namespace. The Inode should track both the storage device class hint and
> the current actual storage device class. FileStatus should expose this
> information (or xattrs in general) to clients.
>
> - Extend the pluggable block policy framework so policies can also consider,
> and specify, affinity for a particular storage device class
>
> - Extend the file creation API to accept a storage device class affinity
> hint. Such a hint can be supplied directly as a parameter, or, if we are
> considering extended attribute support, then instead as one of a set of
> xattrs. The hint would be stored in the namespace and also used by the client
> to indicate to the NameNode/block placement policy/DataNode constraints on
> block placement. Furthermore, if xattrs or device storage class affinity
> hints are associated with directories, then the NameNode should provide the
> storage device affinity hint to the client in the create API response, so the
> client can provide the appropriate hint to DataNodes when writing new blocks.
>
> - The list of candidate DataNodes for new blocks supplied by the NameNode to
> clients should be weighted/sorted by availability of the desired storage
> device class.
>
> - Block replication should consider storage device affinity hints. If a
> client move()s a file from a location under a path with affinity hint X to
> under a path with affinity hint Y, then all blocks currently residing on
> media X should be eventually replicated onto media Y with the then excess
> replicas on media X deleted.
>
> - Introduce the concept of degraded path: a path can be degraded if a block
> placement policy is forced to abandon a constraint in order to persist the
> block, when there may not be available space on the desired device class, or
> to maintain the minimum necessary replication factor. This concept is
> distinct from the corrupt path, where one or more blocks are missing. Paths
> in degraded state should be periodically reevaluated for re-replication.
>
> - The FSShell should be extended with commands for changing the storage
> device class hint for a directory or file.
>
> - Clients like DistCP which compare metadata should be extended to be aware
> of the storage device class hint. For DistCP specifically, there should be an
> option to ignore the storage device class hints, enabled by default.
>
> Suggested semantics:
>
> - The default storage device class should be the null class, or simply the
> “default class”, for all cases where a hint is not available. This should be
> configurable. hdfs-defaults.xml could provide the default as spinning media.
>
> - A storage device class hint should be provided (and is necessary) only when
> the default is not sufficient.
>
> - For backwards compatibility, any FSImage or edit log entry lacking a
> storage device class hint is interpreted as having affinity for the null
> class.
>
> - All blocks for a given file share the same storage device class. If the
> replication factor for this file is increased the replicas should all be
> placed on the same storage device class.
>
> - If one or more blocks for a given file cannot be placed on the required
> device class, then the file is marked as degraded. Files in degraded state
> should be periodically reevaluated for re-replication.
>
> - A directory and path can only have one storage device affinity hint. If the
> file inode specifies a hint, this is used, otherwise we walk up the path
> until a hint is found and use that one, otherwise the default storage class
> is used.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira