[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838283#comment-13838283
 ] 

Colin Patrick McCabe commented on HDFS-5182:
--------------------------------------------

bq. Can you explain further what you mean by "edge-triggered notification" here?

It's edge-triggered in the sense that both the DN and the client send 
notifications when something changes.  Closing the socket (because the client 
or DN died) is just another notification.  Remember that with UNIX domain 
sockets, there is no network to worry about, so a socket close is a pretty good 
sign that the other process is dead.

There is not a notification per-read.  In fact, there might be no notifications 
at all.  Notifications happen when something changes, remember.  It is 
asynchronous with respect to reading.

We have to have the concept of block holds (I am avoiding the word "lease" 
here).  The DFSClient puts a hold on the block when it mmaps it.  In order to 
munlock, the DN must first break the hold.  Otherwise, the DFSClient might read 
data that isn't from memory without checksumming.  Similarly, we can skip 
checksums entirely even in the non-mmap read path when reading from a block 
that is mlocked-- but it requires a hold.

notifications sent from the DFSClient to the DN:
* request block hold (we do this if we're trying to read from the block and we 
believe it's mlocked)
* release block hold (if we're done with the block, or if the DN requested a 
release.  socket close is also interpreted as a release)

notifications sent from the DN to the DFSClient:
* block mlocked
* allow block hold that was just requested (if it is mlocked)
* deny block hold that was just requested (if it's not actually mlocked)
* request release block hold (socket close is also interpreted as a release)

bq. It's also worth considering that some clients (eg HBase) tend to open all 
of their blocks at startup, and never close/reopen files except on errors. It 
would still be nice if our caching mechanism could transition between zero-copy 
and one-copy if we want to migrate those files in and out of cache while the 
client keeps them open. (eg we know that we're about to run an intensive job on 
some HBase table, so we ask HDFS to cache its blocks for an hour or two in the 
middle of the night, and then drop it back out of cache after the batch job is 
done)

Yeah.  Of course, it does require HBase to actually use the ZCR API, which it 
doesn't now.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-5182
>                 URL: https://issues.apache.org/jira/browse/HDFS-5182
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to