> - MemStore CSLM ops: Especially if upserting I quick thought on that one, perhaps it'd be helped by limiting the aggregate size of the CSLM, eg, skip lists at too large a size start to degrade in performance. Something like multiple CSLMs could work? Grow a CSLM to a given size, then start a new one.
On Mon, Jul 11, 2011 at 1:30 PM, Andrew Purtell <apurt...@apache.org> wrote: >> Further, (I asked this previously), where is the general CPU usage in >> HBase? Binary search on keys for seeking, skip list reads and writes, >> and [maybe] MapReduce jobs? > > If you are running colocated MapReduce jobs, then it could be the user code > of course. > > Otherwise it depends on workload. > > For our apps I observe the following top line items when profiling: > > - KV comparators: By far the most common operation, searching keys, > writing HFiles, etc. > > - MemStore CSLM ops: Especially if upserting > > - Servicing RPCs: Writable marshall/unmarshall, monitors > > - Concurrent GC > > It generally looks good but MemStore can be improved, especially for the > upsert case. > > Reminds me I need to profile the latest. It's been a few weeks. > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via > Tom White) > > >>________________________________ >>From: Jason Rutherglen <jason.rutherg...@gmail.com> >>To: dev@hbase.apache.org >>Sent: Sunday, July 10, 2011 3:05 PM >>Subject: Re: Converting byte[] to ByteBuffer >> >>Ted, >> >>Interesting. I think we need to take a deeper look at why essentially >>turning off the caching of uncompressed blocks doesn't [seem to] >>matter. My guess is it's cheaper to decompress on the fly than hog >>from the system IO cache with JVM heap usage. >> >>Ie, CPU is cheaper than disk IO. >> >>Further, (I asked this previously), where is the general CPU usage in >>HBase? Binary search on keys for seeking, skip list reads and writes, >>and [maybe] MapReduce jobs? The rest should more or less be in the >>noise (or is general Java overhead). >> >>I'd be curious to know the avg CPU consumption of an active HBase system. >> >>On Sat, Jul 9, 2011 at 11:14 PM, Ted Dunning <tdunn...@maprtech.com> wrote: >>> No. The JNI is below the HDFS compatible API. Thus the changed code is in >>> the hadoop.jar and associated jars and .so's that MapR supplies. >>> >>> The JNI still runs in the HBase memory image, though, so it can make data >>> available faster. >>> >>> The cache involved includes the cache of disk blocks (not HBase memcache >>> blocks) in the JNI and in the filer sub-system. >>> >>> The detailed reasons why more caching in the file system and less in HBase >>> makes the overall system faster are not completely worked out, but the >>> general outlines are pretty clear. There are likely several factors at work >>> in any case including less GC cost due to smaller memory foot print, caching >>> compressed blocks instead of Java structures and simplification due to a >>> clean memory hand-off with associated strong demarcation of where different >>> memory allocators have jurisdiction. >>> >>> On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen <jason.rutherg...@gmail.com >>>> wrote: >>> >>>> I'm a little confused, I was told none of the HBase code changed with MapR, >>>> if the HBase (not the OS) block cache has a JNI implementation then that >>>> part of the HBase code changed. >>>> On Jul 9, 2011 11:19 AM, "Ted Dunning" <tdunn...@maprtech.com> wrote: >>>> > MapR does help with the GC because it *does* have a JNI interface into an >>>> > external block cache. >>>> > >>>> > Typical configurations with MapR trim HBase down to the minimal viable >>>> size >>>> > and increase the file system cache correspondingly. >>>> > >>>> > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen < >>>> jason.rutherg...@gmail.com >>>> >> wrote: >>>> > >>>> >> MapR doesn't help with the GC issues. If MapR had a JNI >>>> >> interface into an external block cache then that'd be a different >>>> >> story. :) And I'm sure it's quite doable. >>>> >> >>>> >>> >> >> >> > >