[
https://issues.apache.org/jira/browse/HADOOP-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122127#comment-13122127
]
Nathan Roberts commented on HADOOP-7714:
----------------------------------------
Todd, you may already be well aware of this, but just in case...
Patterns like the one below don't usually do what one would expect, especially
if the data has to go over a wire. I believe the reason is due to the way the
socket buffers inside the kernel keep track of the data that needs to be sent.
It's basically just a reference to the page cache page. Therefore, if the data
has not actually left the box when the fadvise is called, the references are
still there so the pages cannot be invalidated. I tried this with a small
native app and a 128MB file, and sure enough everything except for the first
few pages stayed in the page cache.
I can't immediately think of a surefire way around this. We could just call
fadvise once at close and just live with the fact that everything still
buffered at the time won't be affected. We could do what Cristina was doing and
always call fadvise with offset of 0 so that we try to invalidate pages
multiple times. We could call the fadvise asynchronously after a second or so.
Delaying a bit might help us deal with hot blocks better as well.
sendfile(4, 5, [131072000], 65536) = 65536
sendfile(4, 5, [131137536], 65536) = 65536
sendfile(4, 5, [131203072], 65536) = 65536
sendfile(4, 5, [131268608], 65536) = 65536
sendfile(4, 5, [131334144], 65536) = 65536
sendfile(4, 5, [131399680], 65536) = 65536
sendfile(4, 5, [131465216], 65536) = 65536
sendfile(4, 5, [131530752], 65536) = 65536
sendfile(4, 5, [131596288], 65536) = 65536
sendfile(4, 5, [131661824], 65536) = 65536
sendfile(4, 5, [131727360], 65536) = 65536
sendfile(4, 5, [131792896], 65536) = 65536
sendfile(4, 5, [131858432], 65536) = 65536
sendfile(4, 5, [131923968], 65536) = 65536
sendfile(4, 5, [131989504], 65536) = 65536
sendfile(4, 5, [132055040], 65536) = 65536
fadvise64(5, 131072000, 1048576, POSIX_FADV_DONTNEED) = 0
> Add support in native libs for OS buffer cache management
> ---------------------------------------------------------
>
> Key: HADOOP-7714
> URL: https://issues.apache.org/jira/browse/HADOOP-7714
> Project: Hadoop Common
> Issue Type: Bug
> Components: native
> Affects Versions: 0.24.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hadoop-7714-20s-prelim.txt
>
>
> Especially in shared HBase/MR situations, management of the OS buffer cache
> is important. Currently, running a big MR job will evict all of HBase's hot
> data from cache, causing HBase performance to really suffer. However, caching
> of the MR input/output is rarely useful, since the datasets tend to be larger
> than cache and not re-read often enough that the cache is used. Having access
> to the native calls {{posix_fadvise}} and {{sync_data_range}} on platforms
> where they are supported would allow us to do a better job of managing this
> cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira