[jira] [Commented] (HDFS-16864) HDFS advisory caching should drop cache behind block when block closed

ASF GitHub Bot (Jira) Thu, 11 Dec 2025 16:25:29 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-16864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044551#comment-18044551
 ]


ASF GitHub Bot commented on HDFS-16864:
---------------------------------------

github-actions[bot] commented on PR #5204:
URL: https://github.com/apache/hadoop/pull/5204#issuecomment-3644368789

   We're closing this stale PR because it has been open for 100 days with no 
activity. This isn't a judgement on the merit of the PR in any way. It's just a 
way of keeping the PR queue manageable.
   If you feel like this was a mistake, or you would like to continue working 
on it, please feel free to re-open it and ask for a committer to remove the 
stale tag and review again.
   Thanks all for your contribution.




> HDFS advisory caching should drop cache behind block when block closed
> ----------------------------------------------------------------------
>
>                 Key: HDFS-16864
>                 URL: https://issues.apache.org/jira/browse/HDFS-16864
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.3.4
>            Reporter: Dave Marion
>            Priority: Minor
>              Labels: pull-request-available
>
> One of the comments in HDFS-4817 describes the behavior in 
> BlockReceiver.manageWriterOsCache:
> "The general idea is that there isn't much point in calling 
> {{sync_file_pages}} twice on the same offsets, since the sync process has 
> presumably already begun. On the other hand, calling 
> {{fadvise(FADV_DONTNEED)}} again and again will tend to purge more and more 
> bytes from the cache. The reason is because dirty pages (those containing 
> un-written-out-data) cannot be purged using {{{}FADV_DONTNEED{}}}. And we 
> can't know exactly when the pages we wrote will be flushed to disk. But we do 
> know that calling {{FADV_DONTNEED}} on very recently written bytes is a 
> waste, since they will almost certainly not have been written out to disk. 
> That is why it purges between 0 and {{{}lastCacheManagementOffset - 
> CACHE_WINDOW_SIZE{}}}, rather than simply 0 to pos."
> Looking at the code, I'm wondering if at least the last 8MB (size of 
> CACHE_WINDOW_SIZE) of a block might be left without an associated 
> FADVISE_DONT_NEED call. We're having a 
> [discussion|https://the-asf.slack.com/archives/CERNB8NDC/p1669399302264189] 
> in #accumulo about the file caching feature and I found some interesting 
> [results|https://gist.github.com/dlmarion/1835f387b0fa8fb9dbf849a0c87b6d04] 
> in a test that we wrote. Specifically, that for a multi-block file using 
> setDropBehind with either hsync or CreateFlag.SYNC_BLOCK, parts of each block 
> remained in the cache instead of parts of the last block.
> I'm wondering if there is a reason not to call fadvise(FADV_DONTNEED) on the 
> entire block in close 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java#L371]
>  when dropCacheBehindWrites is true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16864) HDFS advisory caching should drop cache behind block when block closed

Reply via email to