[ 
https://issues.apache.org/jira/browse/HBASE-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093497#comment-15093497
 ] 

Orange commented on HBASE-10742:
--------------------------------

Whether hbase can determine data temperature now?

> Data temperature aware compaction policy
> ----------------------------------------
>
>                 Key: HBASE-10742
>                 URL: https://issues.apache.org/jira/browse/HBASE-10742
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Andrew Purtell
>
> Reading "Identifying Hot and Cold Data in Main-Memory Databases" (Levandoski, 
> Larson, and Stoica), it occurred to me that some of the motivation applies to 
> HBase and some of the results can inform a data temperature aware compaction 
> policy implementation.
> We also wish to optimize retention of cells in the working set in memory, in 
> blockcache. 
> We can also consider further and related performance optimizations in HBase 
> that awareness of hot and cold data can enable, even for cases where the 
> working set does not fit in memory. If we could partition HFiles into hot and 
> cold (cold+lukewarm) and move cells between them at compaction time, then we 
> could:
> - Migrate hot HFiles onto alternate storage tiers with improved read latency 
> and throughput characteristics. This has been discussed before on HBASE-6572. 
> Or, migrate cold HFiles to an archival tier.
> - Preload hot HFiles into blockcache to increase cache hit rates, especially 
> when regions are first brought online. And/or add another LRU priority to 
> increase the likelihood of retention of blocks in hot HFiles. This could be 
> sufficiently different from ARC to avoid issues there. 
> - Reduce the compaction priorities of cold HFiles, with proportional 
> reduction in priority IO and write amplification, since cold files would less 
> frequently participate in reads.
> Levandoski et. al. describe determining data temperature with low overhead 
> using an out of band estimation process running in the background over an 
> access log. We could consider logging reads along with mutations and 
> similarly process the result in the background. The WAL could be overloaded 
> to carry access log records, or we could follow the approach described in the 
> paper and maintain an in memory access log only. 
> {quote}
> We chose the offline approach for several reasons. First, as mentioned 
> earlier, the overhead of even the simplest caching scheme is very high. 
> Second, the offline approach is generic and requires minimum changes to the 
> database engine. Third, logging imposes very little overhead during normal 
> operation. Finally, it allows flexibility in when, where, and how to analyze 
> the log and estimate access frequencies. For instance, the analysis can be 
> done on a separate machine, thus reducing overhead on the system running the 
> transactional workloads.
> {quote}
> Importantly, they only log a sample of all accesses.
> {quote}
> To implement sampling, we have each worker thread flip a biased coin before 
> starting a new query (where bias correlates with sample rate). The thread 
> records its accesses in log buffers (or not) based on the outcome of the coin 
> flip. In Section V, we report experimental results showing that sampling 10% 
> of the accesses reduces the accuracy by only 2.5%,
> {quote}
> Likewise we would only record a subset of all accesses to limit overheads.
> The offline process estimates access frequencies over discrete time slices 
> using exponential smoothing. (Markers representing time slice boundaries are 
> interleaved with access records in the log.) Forward and backward 
> classification algorithms are presented. The forward algorithm requires a 
> full scan over the log and storage proportional to the number of unique cell 
> addresses, while the backward algorithm requires reading a least the tail of 
> the log in reverse order.
> If we overload the WAL to carry the access log, offline data temperature 
> estimation can piggyback as a WAL listener. The forward algorithm would then 
> be a natural choice. The HBase master is fairly idle most of the time and 
> less memory hungry as a regionserver, at least in today's architecture. We 
> could probably get away with considering only row+family as a unique 
> coordinate to minimize space overhead.  Or if instead we maintain the access 
> logs in memory at the RegionServer, then there is a parallel formulation and 
> we could benefit from the reverse algorithm's ability to terminate early once 
> confidence bounds are reached and backwards scanning IO wouldn't be a 
> concern. This handwaves over a lot of details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to