[ 
https://issues.apache.org/jira/browse/HADOOP-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121682#comment-13121682
 ] 

Cristina L. Abad commented on HADOOP-7714:
------------------------------------------

I got started with some testing today and can definitely see the effect of the 
fadvise on the page cache, however, in my tests I was still seeing about 8MB in 
the page cache belonging to a 64MB block for which fadvise calls were being 
issued (I checked this with strace). I looked into the BlockSender code and 
there seems to be an error in the fadvise call:

NativeIO.posixFadviseIfPossible(blockInFd, lastCacheDropOffset, offset - 1024, 
NativeIO.POSIX_FADV_DONTNEED);

should be

NativeIO.posixFadviseIfPossible(blockInFd, lastCacheDropOffset, offset - 
lastCacheDropOffset, NativeIO.POSIX_FADV_DONTNEED);

I am not sure what the "- 1024" is for, but in any case that parameter should 
be the length instead of an offset. Having said that, this is not what is 
causing the 8MB to stay in the page cache. I tried changing the fadvise call to:

NativeIO.posixFadviseIfPossible(blockInFd, 0, offset, 
NativeIO.POSIX_FADV_DONTNEED); // Yes, I know, this is redundant since it is 
being called frequently

and that reduced the number of pages in the cache from 8MB to a varying value 
between 0-2MB.

I looked into what pages were remaining in the cache and it seems that a few 
random pages are staying every time, plus some pages at the end.

Nathan (Roberts) and I looked through this issue and thought the kernel's read 
ahead mechanism may be preventing some pages from being removed from the page 
cache so we changed the POSIX_FADV_SEQUENTIAL to POSIX_FADV_RANDOM and that 
seemed to make things better in the sense that now I am only seeing very little 
pages staying around (around 4-8 4KB pages in the latests tests I ran today). 
Having said that, POSIX_FADV_SEQUENTIAL is of course what we should be using. 
Any ideas on how to make all the fadvised (DONT_NEED) pages to go away? We are 
puzzled on why those last few pages seem to hang around, specially since I 
modified the fadvise calls to go from offset 0 every time; in other words, we 
are repeatedly telling the kernel to remove those pages and still a few manage 
to stay around. I'll keep looking into this issue and will try to get some 
performance numbers but would love to have the code working as expected before 
doing the tests.

BTW, I did not look into the BlockReceiver code; once BlockSender is working as 
expected I'll look into it.

I hope that what I wrote makes sense; if something is not clear I'll be happy 
to explain the issue in more detail.
                
> Add support in native libs for OS buffer cache management
> ---------------------------------------------------------
>
>                 Key: HADOOP-7714
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7714
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-7714-20s-prelim.txt
>
>
> Especially in shared HBase/MR situations, management of the OS buffer cache 
> is important. Currently, running a big MR job will evict all of HBase's hot 
> data from cache, causing HBase performance to really suffer. However, caching 
> of the MR input/output is rarely useful, since the datasets tend to be larger 
> than cache and not re-read often enough that the cache is used. Having access 
> to the native calls {{posix_fadvise}} and {{sync_data_range}} on platforms 
> where they are supported would allow us to do a better job of managing this 
> cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to