[ 
https://issues.apache.org/jira/browse/HBASE-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840355#comment-13840355
 ] 

Colin Patrick McCabe commented on HBASE-10052:
----------------------------------------------

bq. One thing to be wary of: during the compaction, readers are still accessing 
the old files, so if you're compacting large files, this could really hurt read 
latency during compactions (assuming that people are relying on linux LRU in 
addition to hbase-internal LRU for performance).

That's a fair point.

bq. In most cases, as soon as the compaction is complete, we end up removing 
the input files anyway (thus removing from cache), right?

Unlinking a file doesn't remove that file from the buffer cache.  If the 
unlinked file is no longer referenced (certainly the case here), it will be 
removed over time, as other things evict it.  In the meantime, having those 
pages buffered means that something else isn't.

When doing the fadvise work, I remember us coming up with a crude hack that did 
fadvise from HBase during compactions and seeing some performance gain.  But it 
seems like might be workload-dependent.

It's a shame that there isn't a way to tell Linux to do a read without caching. 
 That's really what we want here.  Instead, we just have a way of nuking the 
cache for a range of the file if it exists, which is not at all the same thing. 
 I took a look at the Linux source tree again today, and {{FADV_NOREUSE}} was 
still a no-op :(

bq. Hmm, ok, moving out until we have something with a quantified benefit.

Yeah, it would be interesting to see some test numbers.  I also wonder if we 
could somehow quantify how often the HBase LRU hits.

> use HDFS advisory caching to avoid caching HFiles that are not going to be 
> read again (because they are being compacted)
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10052
>                 URL: https://issues.apache.org/jira/browse/HBASE-10052
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Colin Patrick McCabe
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.98.1, 0.99.0
>
>
> HBase can benefit from doing dropbehind during compaction since compacted 
> files are not read again.  HDFS advisory caching, introduced in HDFS-4817, 
> can help here.  The right API here is {{DataInputStream#setDropBehind}}.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to