[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866545#comment-13866545
 ] 

Colin Patrick McCabe commented on HDFS-5182:
--------------------------------------------

A few notes about the planned implementation here:

The main idea here is to have a shared memory segment which the DFSClient and 
Datanode can both read and write.  Before each read, the DFSClient will look at 
this shared memory segment to see if it can be "anchored."  A segment will be 
anchorable if the datanode has mlocked it.  If the segment can be anchored, the 
dfsclient will increment the anchor count.  Then, the client can read without 
validating the checksum.  When the client is done reading it will decrement the 
anchor count.  These are just memory operations, so they will be fast.

Similarly, when the client tries to do a zero-copy read, it will check to see 
if the segment is anchorable, and increment the anchor count before performing 
the mmap.  The anchor count will stay incremented until the mmap is closed.  
One exception is if the client passes the ReadOption.SKIP_CHECKSUMS flag.  In 
that case, we do not need to consult the anchor flag because we are willing to 
tolerate bad data being returned or SIGBUS.

Shared memory segments will have a fixed size and contain a series of 
fixed-size slots.  The client will request a shared memory segment via the 
REQUEST_SHORT_CIRCUIT_FDS operation.  Of course, not every 
REQUEST_SHORT_CIRCUIT_FDS operation needs to get a new shared memory segment, 
since each segment can hold multiple slots.  The client caches these segments 
and only requests a new one when it needs it.  Segments will be closed when no 
more slots in them are in use.

One issue with the shared memory segments discussed here is that when a client 
terminates, the datanode receives no notification that the shared memory 
segment it created is no longer needed.  For this reason, each shared memory 
segment will have a domain socket associated with it.  The only function of 
this socket is to cause a close notification to be sent to the datanode when 
the client closes (or vice versa).  (When a UNIX domain socket closes, the 
remote end gets a close notification).  The socket which is used will be the 
same socket on which the REQUEST_SHORT_CIRCUIT_FDS that fetched the segment was 
performed.  We simply don't put it back into the peer cache.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-5182
>                 URL: https://issues.apache.org/jira/browse/HDFS-5182
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to