[
https://issues.apache.org/jira/browse/HADOOP-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121682#comment-13121682
]
Cristina L. Abad commented on HADOOP-7714:
------------------------------------------
I got started with some testing today and can definitely see the effect of the
fadvise on the page cache, however, in my tests I was still seeing about 8MB in
the page cache belonging to a 64MB block for which fadvise calls were being
issued (I checked this with strace). I looked into the BlockSender code and
there seems to be an error in the fadvise call:
NativeIO.posixFadviseIfPossible(blockInFd, lastCacheDropOffset, offset - 1024,
NativeIO.POSIX_FADV_DONTNEED);
should be
NativeIO.posixFadviseIfPossible(blockInFd, lastCacheDropOffset, offset -
lastCacheDropOffset, NativeIO.POSIX_FADV_DONTNEED);
I am not sure what the "- 1024" is for, but in any case that parameter should
be the length instead of an offset. Having said that, this is not what is
causing the 8MB to stay in the page cache. I tried changing the fadvise call to:
NativeIO.posixFadviseIfPossible(blockInFd, 0, offset,
NativeIO.POSIX_FADV_DONTNEED); // Yes, I know, this is redundant since it is
being called frequently
and that reduced the number of pages in the cache from 8MB to a varying value
between 0-2MB.
I looked into what pages were remaining in the cache and it seems that a few
random pages are staying every time, plus some pages at the end.
Nathan (Roberts) and I looked through this issue and thought the kernel's read
ahead mechanism may be preventing some pages from being removed from the page
cache so we changed the POSIX_FADV_SEQUENTIAL to POSIX_FADV_RANDOM and that
seemed to make things better in the sense that now I am only seeing very little
pages staying around (around 4-8 4KB pages in the latests tests I ran today).
Having said that, POSIX_FADV_SEQUENTIAL is of course what we should be using.
Any ideas on how to make all the fadvised (DONT_NEED) pages to go away? We are
puzzled on why those last few pages seem to hang around, specially since I
modified the fadvise calls to go from offset 0 every time; in other words, we
are repeatedly telling the kernel to remove those pages and still a few manage
to stay around. I'll keep looking into this issue and will try to get some
performance numbers but would love to have the code working as expected before
doing the tests.
BTW, I did not look into the BlockReceiver code; once BlockSender is working as
expected I'll look into it.
I hope that what I wrote makes sense; if something is not clear I'll be happy
to explain the issue in more detail.
> Add support in native libs for OS buffer cache management
> ---------------------------------------------------------
>
> Key: HADOOP-7714
> URL: https://issues.apache.org/jira/browse/HADOOP-7714
> Project: Hadoop Common
> Issue Type: Bug
> Components: native
> Affects Versions: 0.24.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hadoop-7714-20s-prelim.txt
>
>
> Especially in shared HBase/MR situations, management of the OS buffer cache
> is important. Currently, running a big MR job will evict all of HBase's hot
> data from cache, causing HBase performance to really suffer. However, caching
> of the MR input/output is rarely useful, since the datasets tend to be larger
> than cache and not re-read often enough that the cache is used. Having access
> to the native calls {{posix_fadvise}} and {{sync_data_range}} on platforms
> where they are supported would allow us to do a better job of managing this
> cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira