This case is very common. According to the Virident co-founder who worked at Google, Bigtable sees 90% read workload. Fiddling with the commit log threshold is not a good option. It can majorly impact how long it takes to bring the system down and back up again.
- Doug On Thu, Nov 26, 2009 at 6:08 PM, Luke <[email protected]> wrote: > I'm not sure this is a common enough case worth tackling now. Range > server can fully take advantage of available memory in most production > cases. For certain benchmarks (just enough RAM to hold the test data), > we can fiddle the commit log prune thresholds to minimize the impact > of the difference. We should try that before implementing the feature, > which I do think, is a good idea in general. > > > On Nov 25, 2009, at 2:15 PM, Doug Judd <[email protected]> wrote: > > > I have a proposal that should improve Hypertable performance in > > certain situations. When running the HBase benchmark, the one test > > that we didn't significantly beat HBase on was the random read > > test. During the test, the RangeSevers were using just a little > > more than 800MB, which was the configured size of the block cache. > > However, HBase was using all of the RAM that was configured. I > > suspect the problem is that when we loaded the data into Hypertable, > > the RangeServers aggressively compacted the data to keep the commit > > log pruned back to a minimum, whereas HBase had left a significant > > amount of data in their cell cache equivalent. This would give > > HBase and unfair advantage in the random read test since more of the > > dataset would have been resident in memory. > > > > In general, if the RangeServers have memory available to them, they > > should use it if possible. I propose that after a minor compaction, > > we keep the immutable cell cache in memory and have it overshadow > > the corresponding CellStore on disk. When the system determines > > that it needs more memory in its regular maintenance task, it can > > purge these cell caches. > > > > At some point we should probably have a learning algorithm, or at > > the very least a heuristic that determines the best use of memory > > among these shadow cell caches, the block cache, and the query cache. > > > > - Doug > > > > -- > > > > You received this message because you are subscribed to the Google > > Groups "Hypertable Development" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > [email protected]<hypertable-dev%[email protected]> > > . > > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en > > . > > -- > > You received this message because you are subscribed to the Google Groups > "Hypertable Development" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<hypertable-dev%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en. > > > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
