[ 
https://issues.apache.org/jira/browse/HDFS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516465#comment-16516465
 ] 

Ewan Higgs commented on HDFS-12090:
-----------------------------------

Given the feedback of the meetings, we have taken another look at the existing 
design of the write path for Provided Storage.

As the design of the SPS still does not appear to be settled, we propose that 
the SyncService is implemented as an external service that runs akin to the 
Mover and Balancer not tied to the SPS. Optionally running the SyncService 
outside the NN has been a goal for HDFS-12090 so this is not a big shift in 
thought. Simplifying the design so the SyncService runs only outside the NN is 
a good idea.

Attached is a sequence diagram of a potential workflow for writing files (). In 
this design, the SyncService asks the NN to make snapshots and then make diffs. 
The SyncService then coordinates the writes via the datanodes to the external 
synchronization endpoint. All metadata operations (create dir, delete files, 
update aliasmap) is performed through the SyncService. Writing from Datanode to 
the external synchronization endpoint will introduce two new calls in the 
ClientDatanodeProtocol (provisionally):
{code:java}
void multipartPutPart(String sessionId, UploadHandle uploadHandle,
    int partNumber, List<LocatedBlock>, String uri, int offset, long length);

message CompletedPutPartsProto {
  required bytes uploadHandle = 1;
  repeated PutPartExecutionResultProto = 2;
}

message PutPartExecutionResultProto {
  optional bytes partHandle = 1;
  optional int64 numberOfBytes = 2;
}

List<CompletedPutPartsProto> getCompletedPutParts(String sessionId);
{code}
Upon completion of work, the SyncService will offer the Namenode a 
ProvidedBlockReport through a new namenode protocol (new socket, etc) that will 
inform the Namenode when the blocks in the report have PROVIDED replicas. At 
this point, the NN can then update the ProvidedBlockMap and e.g. ask the 
datanodes to delete extraneous DISK replicas (if the storage policy calls for 
it i.e. [DISK, DISK, DISK, PROVIDED] vs [DISK, PROVIDED].

The SyncService is then able to remove the old snapshot.

This design does not affect the functional specification for HDFS-12090.

Some things that are particularly nice about this design:
 * The Namenode does not originate any RPC calls.
 * If the SyncService somehow dies, then the Namenode is not hanging onto any 
state aside from a snapshot.
 * Polling the Datanodes for completed part handles means that the SyncService 
does not need to wait on sockets for potentially long IO operations.
 * If no writes are planned to happen in an area for Provided Storage, 
administrators can even shut off the SyncService.

 

> Handling writes from HDFS to Provided storages
> ----------------------------------------------
>
>                 Key: HDFS-12090
>                 URL: https://issues.apache.org/jira/browse/HDFS-12090
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Virajith Jalaparti
>            Priority: Major
>         Attachments: HDFS-12090-Functional-Specification.001.pdf, 
> HDFS-12090-Functional-Specification.002.pdf, 
> HDFS-12090-Functional-Specification.003.pdf, HDFS-12090-design.001.pdf, 
> HDFS-12090.0000.patch, HDFS-12090.0001.patch
>
>
> HDFS-9806 introduces the concept of {{PROVIDED}} storage, which makes data in 
> external storage systems accessible through HDFS. However, HDFS-9806 is 
> limited to data being read through HDFS. This JIRA will deal with how data 
> can be written to such {{PROVIDED}} storages from HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to