[ 
https://issues.apache.org/jira/browse/HBASE-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993417#comment-13993417
 ] 

Liang Xie commented on HBASE-10052:
-----------------------------------

bq.  One thing to be wary of: during the compaction, readers are still 
accessing the old files, so if you're compacting large files, this could really 
hurt read latency during compactions (assuming that people are relying on linux 
LRU in addition to hbase-internal LRU for performance).
Since by default we has 3 replicas in HDFS layer, the current InputStream drops 
caching against the only 1 picked up replica, seems not ideal considering the 
possible redundant caching on multi nodes if a failover or sth happened. How 
about providing an async function in InputStream layer, say dropFileCaches, 
getting all LocatedBlocks, and expose a similar interface in dn layer as well, 
then clear all caching in all dns for those blocks.
we can request this async dropFileCaches just before closing the original store 
files be compacted.  Just a raw idea, crazy? :)


> use HDFS advisory caching to avoid caching HFiles that are not going to be 
> read again (because they are being compacted)
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10052
>                 URL: https://issues.apache.org/jira/browse/HBASE-10052
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Colin Patrick McCabe
>            Priority: Minor
>             Fix For: 0.99.0, 0.98.3
>
>
> HBase can benefit from doing dropbehind during compaction since compacted 
> files are not read again.  HDFS advisory caching, introduced in HDFS-4817, 
> can help here.  The right API here is {{DataInputStream#setDropBehind}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to