I created HBASE-6351 with Otis's comments. Let's continue discussion from there.
On Sat, Jul 7, 2012 at 12:01 AM, Lars George <[email protected]> wrote: > Hi Otis, > > Throttling I think is a less needed feature as we typically struggle to > keep up with the compaction queue under load. Reducing background noise > caused by compactions is more an exercise of tuning the compaction > algorithm itself. That is still somewhat of a black art it seems. > > As for the OS buffer bypassing, Todd did some work along these lines in > HDFS, which helped speeding up HBase (for CDH this went into CDH3u4). Not > sure if it is really the same or not, so I leave this for someone else to > comment on. > > But indeed interesting ideas and should be discussed thoroughly. > > Lars > > On Jul 7, 2012, at 7:49, Otis Gospodnetic <[email protected]> > wrote: > > > Hi, > > > > Here is something that may be of interest to HBase: > > > > Lucene 4.0.0-Alpha was recently released. Mike McCandless, sne of the > Lucene developers, wrote a really nice post about new things in this > version of Lucene. The part that I think is interesting for HBase, and > that HBase devs may want to look at (and borrow to use with compactions) is > this: > > > > Reducing merge IO impact > > > > Merging (consolidating many small segments into a single big one) is a > very IO and CPU intensive operation which can easily interfere with ongoing > searches. In 4.0.0 we now have two ways to reduct this impact: > > * Rate-limit the IO caused by ongoing merging, by > callingFSDirectory.setMaxMergeWriteMBPerSec. > > > > > > * Use the new NativeUnixDirectory which bypasses the OS's IO cache > for all merge IO, by using direct IO. This ensures that a merge won't evict > hot pages used by searches. (Note that there is also a native > WindowsDirectory, but it does not yet use direct IO during merging... > patches welcome!). > > > > Remember to also set swappiness to 0 on Linux if you want to maximize > search responsiveness. > > > > More generally, the APIs that open an input or output file > (Directory.openInput andDirectory.createOutput) now take an IOContext > describing what's being done (e.g., flush vs merge), so you can create a > custom Directory that changes its behavior depending on the context. > > > > These changes were part of a 2011 Google Summer of Code project (thank > you Varun!). > > > > > > > > Thoughts? > > > > Otis > > ---- > > Performance Monitoring for Solr / ElasticSearch / HBase - > http://sematext.com/spm >
