I agree that if we can achieve most of the performance gain by loosening the CL prune thresholds, that might be a temporary option (although one must consider the effect on recovery time). I don't think this is a rare case though. Random read over large data sets which don't fit in memory should be a reasonably common use case.
-Sanjit On Thu, Nov 26, 2009 at 6:08 PM, Luke <[email protected]> wrote: > I'm not sure this is a common enough case worth tackling now. Range > server can fully take advantage of available memory in most production > cases. For certain benchmarks (just enough RAM to hold the test data), > we can fiddle the commit log prune thresholds to minimize the impact > of the difference. We should try that before implementing the feature, > which I do think, is a good idea in general. > > > On Nov 25, 2009, at 2:15 PM, Doug Judd <[email protected]> wrote: > >> I have a proposal that should improve Hypertable performance in >> certain situations. When running the HBase benchmark, the one test >> that we didn't significantly beat HBase on was the random read >> test. During the test, the RangeSevers were using just a little >> more than 800MB, which was the configured size of the block cache. >> However, HBase was using all of the RAM that was configured. I >> suspect the problem is that when we loaded the data into Hypertable, >> the RangeServers aggressively compacted the data to keep the commit >> log pruned back to a minimum, whereas HBase had left a significant >> amount of data in their cell cache equivalent. This would give >> HBase and unfair advantage in the random read test since more of the >> dataset would have been resident in memory. >> >> In general, if the RangeServers have memory available to them, they >> should use it if possible. I propose that after a minor compaction, >> we keep the immutable cell cache in memory and have it overshadow >> the corresponding CellStore on disk. When the system determines >> that it needs more memory in its regular maintenance task, it can >> purge these cell caches. >> >> At some point we should probably have a learning algorithm, or at >> the very least a heuristic that determines the best use of memory >> among these shadow cell caches, the block cache, and the query cache. >> >> - Doug >> >> -- >> >> You received this message because you are subscribed to the Google >> Groups "Hypertable Development" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected] >> . >> For more options, visit this group at >> http://groups.google.com/group/hypertable-dev?hl=en >> . > > -- > > You received this message because you are subscribed to the Google Groups > "Hypertable Development" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en. > > > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
