Daryn Sharp commented on HDFS-13310:

{quote}The current BlockCommand protocol treats all blocks independently *since 
DNs don't really have a concept of a file*; only blocks. This is the reason we 
want a new command.
The bolded (my emphasis) statement is accurately captures the crux of the 
issue:  is it a deficiency of the DN to only understand the concept of block?  
The DN is currently has a simple and elegant design.  It stores blocks.  It 
moves blocks.  It deletes blocks.  That's the design abstraction I implied will 
become leaky.

That simplicity, which I believe is an excellent design strength, is at odds 
with the design of this s4 upload feature.  The DN must know the file id, 
offset/length of the replica within the file, and block locations for an 
unknown reason.

Here's my general concerns:
 * Should the DN effectively become "file aware"?  Perhaps it might be ok if 
only for backup  and only in the provided storage type.
 * Will subsequent patches extend this file-awareness to more of the DN?  If 
yes, I have serious reservations.
 * How will this functionality be managed?  Do you intend to add the control 
service directly into the NN?
 * How will the feature interact with replication operations and the balancer?

Before debating the fine points, please help me understand the overall feature: 
 Is the intent that an admin must explicitly issue a "backup" operation?  If 
yes, what are the pros/cons over using a (modified) distcp?

> [PROVIDED Phase 2] The DatanodeProtocol should be have DNA_BACKUP to backup 
> blocks
> ----------------------------------------------------------------------------------
>                 Key: HDFS-13310
>                 URL: https://issues.apache.org/jira/browse/HDFS-13310
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ewan Higgs
>            Assignee: Ewan Higgs
>            Priority: Major
>         Attachments: HDFS-13310-HDFS-12090.001.patch, 
> HDFS-13310-HDFS-12090.002.patch
> As part of HDFS-12090, Datanodes should be able to receive DatanodeCommands 
> in the heartbeat response that instructs it to backup a block.
> This should take the form of two sub commands: PUT_FILE (when the file is <=1 
> block in size) and MULTIPART_PUT_PART when part of a Multipart Upload (see 
> HDFS-13186).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to