[
https://issues.apache.org/jira/browse/HDFS-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437367#comment-16437367
]
Daryn Sharp commented on HDFS-13310:
------------------------------------
{quote}The current BlockCommand protocol treats all blocks independently *since
DNs don't really have a concept of a file*; only blocks. This is the reason we
want a new command.
{quote}
The bolded (my emphasis) statement is accurately captures the crux of the
issue: is it a deficiency of the DN to only understand the concept of block?
The DN is currently has a simple and elegant design. It stores blocks. It
moves blocks. It deletes blocks. That's the design abstraction I implied will
become leaky.
That simplicity, which I believe is an excellent design strength, is at odds
with the design of this s4 upload feature. The DN must know the file id,
offset/length of the replica within the file, and block locations for an
unknown reason.
Here's my general concerns:
* Should the DN effectively become "file aware"? Perhaps it might be ok if
only for backup and only in the provided storage type.
* Will subsequent patches extend this file-awareness to more of the DN? If
yes, I have serious reservations.
* How will this functionality be managed? Do you intend to add the control
service directly into the NN?
* How will the feature interact with replication operations and the balancer?
Before debating the fine points, please help me understand the overall feature:
Is the intent that an admin must explicitly issue a "backup" operation? If
yes, what are the pros/cons over using a (modified) distcp?
> [PROVIDED Phase 2] The DatanodeProtocol should be have DNA_BACKUP to backup
> blocks
> ----------------------------------------------------------------------------------
>
> Key: HDFS-13310
> URL: https://issues.apache.org/jira/browse/HDFS-13310
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Ewan Higgs
> Assignee: Ewan Higgs
> Priority: Major
> Attachments: HDFS-13310-HDFS-12090.001.patch,
> HDFS-13310-HDFS-12090.002.patch
>
>
> As part of HDFS-12090, Datanodes should be able to receive DatanodeCommands
> in the heartbeat response that instructs it to backup a block.
> This should take the form of two sub commands: PUT_FILE (when the file is <=1
> block in size) and MULTIPART_PUT_PART when part of a Multipart Upload (see
> HDFS-13186).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]