Re: For HBase compactions - Lucene's IO impact reduction code

Ted Yu Sat, 07 Jul 2012 03:29:14 -0700

I created HBASE-6351 with Otis's comments.

Let's continue discussion from there.


On Sat, Jul 7, 2012 at 12:01 AM, Lars George <[email protected]> wrote:

> Hi Otis,
>
> Throttling I think is a less needed feature as we typically struggle to
> keep up with the compaction queue under load. Reducing background noise
> caused by compactions is more an exercise of tuning the compaction
> algorithm itself. That is still somewhat of a black art it seems.
>
> As for the OS buffer bypassing, Todd did some work along these lines in
> HDFS, which helped speeding up HBase (for CDH this went into CDH3u4). Not
> sure if it is really the same or not, so I leave this for someone else to
> comment on.
>
> But indeed interesting ideas and should be discussed thoroughly.
>
> Lars
>
> On Jul 7, 2012, at 7:49, Otis Gospodnetic <[email protected]>
> wrote:
>
> > Hi,
> >
> > Here is something that may be of interest to HBase:
> >
> > Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the
> Lucene developers, wrote a really nice post about new things in this
> version of Lucene.  The part that I think is interesting for HBase, and
> that HBase devs may want to look at (and borrow to use with compactions) is
> this:
> >
> > Reducing merge IO impact
> >
> > Merging (consolidating many small segments into a single big one) is a
> very IO and CPU intensive operation which can easily interfere with ongoing
> searches. In 4.0.0 we now have two ways to reduct this impact:
> >    * Rate-limit the IO caused by ongoing merging, by
> callingFSDirectory.setMaxMergeWriteMBPerSec.
> >
> >
> >    * Use the new NativeUnixDirectory which bypasses the OS's IO cache
> for all merge IO, by using direct IO. This ensures that a merge won't evict
> hot pages used by searches. (Note that there is also a native
> WindowsDirectory, but it does not yet use direct IO during merging...
> patches welcome!).
> >
> > Remember to also set swappiness to 0 on Linux if you want to maximize
> search responsiveness.
> >
> > More generally, the APIs that open an input or output file
> (Directory.openInput andDirectory.createOutput) now take an IOContext
> describing what's being done (e.g., flush vs merge), so you can create a
> custom Directory that changes its behavior depending on the context.
> >
> > These changes were part of a 2011 Google Summer of Code project (thank
> you Varun!).
> >
> >
> >
> > Thoughts?
> >
> > Otis
> > ----
> > Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>

Re: For HBase compactions - Lucene's IO impact reduction code

Reply via email to