[ 
https://issues.apache.org/jira/browse/HBASE-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416597#comment-13416597
 ] 

Jean-Daniel Cryans commented on HBASE-6383:
-------------------------------------------

bq. I don't know how much I would expect that - in theory under 'regular' 
workload (depending on the number of scans) I would expect the times to be 
increasingly separate as the queue becomes more efficient (scan resistant) 
compared to the regular block cache, which would have to go to disk.

I meant the read time for a single access to the block cache is in the 
nanoseconds, the total runtime for the tests doesn't make much sense since it's 
mostly skewed by the time it takes to generate a block (or read from disk) in 
order to cache it.

bq. Sounds like its not worth it overall, but maybe for some workloads? I'd 
like to do some testing on our cluster too, but probably don't have time for a 
bit.

Sorry I mislead you, there is a difference and the best way to measure it is 
the hit ratio (unless it takes too long to read from the cache, to a point 
where it would affect latency). In some cases a saw a difference of a few 
percent which in the caching domain is a big deal.
                
> Investigate using 2Q for block cache
> ------------------------------------
>
>                 Key: HBASE-6383
>                 URL: https://issues.apache.org/jira/browse/HBASE-6383
>             Project: HBase
>          Issue Type: New Feature
>          Components: performance, regionserver
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>            Priority: Minor
>
> Currently we use a basic version of LRU to handle block caching. LRU is know 
> to be very susceptible to scan thrashing (not scan resistant), which is a 
> common operation in HBase. 2Q is an efficient caching algorithm that emulates 
> the effectivness of LRU/2 (eviction based not on the last access, but rather 
> the access before the last), but is O(1), rather than O(lg\(n)) in complexity.
> JD has long been talking about investigating 2Q as it may be far better for 
> HBase than LRU and has been shown to be incredibly useful for traditional 
> database caching on production systems.
> One would need to implement 2Q (though the pseudocode in the paper is quite 
> explicit) and then test against the existing cache implementation.
> The link to the original paper is here: www.vldb.org/conf/1994/P439.PDF
> A short overview of 2Q:
> 2Q uses two queues (hence the name) and a list of pointers to keep track of 
> cached blocks. The first queue is for new, hot items (Ain). If an item is 
> accessed that isn't in Ain, the coldest block is evicted from Ain and the new 
> item replaces it. Anything accessed in Ain is already stored in memory and 
> kept in Ain.
> When a block is evicted from Ain, it is moved to Aout _as a pointer_. If Aout 
> is full, the oldest element is evicted and replaced with the new pointer.
> The key to 2Q comes in that when you access something in Aout, it is reloaded 
> into memory and stored in queue B. If B becomes full, then the coldest block 
> is evicted. 
> This essentially makes Aout a filter for long-term hot items, based on the 
> size of Aout. The original authors found that while you can tune Aout, it 
> generally performs very well at at "50% of the number of pages as would fit 
> into the buffer", but can be tuned as low as 5% at only a slight cost to 
> responsiveness to changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to