> Further, (I asked this previously), where is the general CPU usage in > HBase? Binary search on keys for seeking, skip list reads and writes, > and [maybe] MapReduce jobs?
If you are running colocated MapReduce jobs, then it could be the user code of course. Otherwise it depends on workload. For our apps I observe the following top line items when profiling: - KV comparators: By far the most common operation, searching keys, writing HFiles, etc. - MemStore CSLM ops: Especially if upserting - Servicing RPCs: Writable marshall/unmarshall, monitors - Concurrent GC It generally looks good but MemStore can be improved, especially for the upsert case. Reminds me I need to profile the latest. It's been a few weeks. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Jason Rutherglen <jason.rutherg...@gmail.com> >To: dev@hbase.apache.org >Sent: Sunday, July 10, 2011 3:05 PM >Subject: Re: Converting byte[] to ByteBuffer > >Ted, > >Interesting. I think we need to take a deeper look at why essentially >turning off the caching of uncompressed blocks doesn't [seem to] >matter. My guess is it's cheaper to decompress on the fly than hog >from the system IO cache with JVM heap usage. > >Ie, CPU is cheaper than disk IO. > >Further, (I asked this previously), where is the general CPU usage in >HBase? Binary search on keys for seeking, skip list reads and writes, >and [maybe] MapReduce jobs? The rest should more or less be in the >noise (or is general Java overhead). > >I'd be curious to know the avg CPU consumption of an active HBase system. > >On Sat, Jul 9, 2011 at 11:14 PM, Ted Dunning <tdunn...@maprtech.com> wrote: >> No. The JNI is below the HDFS compatible API. Thus the changed code is in >> the hadoop.jar and associated jars and .so's that MapR supplies. >> >> The JNI still runs in the HBase memory image, though, so it can make data >> available faster. >> >> The cache involved includes the cache of disk blocks (not HBase memcache >> blocks) in the JNI and in the filer sub-system. >> >> The detailed reasons why more caching in the file system and less in HBase >> makes the overall system faster are not completely worked out, but the >> general outlines are pretty clear. There are likely several factors at work >> in any case including less GC cost due to smaller memory foot print, caching >> compressed blocks instead of Java structures and simplification due to a >> clean memory hand-off with associated strong demarcation of where different >> memory allocators have jurisdiction. >> >> On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen <jason.rutherg...@gmail.com >>> wrote: >> >>> I'm a little confused, I was told none of the HBase code changed with MapR, >>> if the HBase (not the OS) block cache has a JNI implementation then that >>> part of the HBase code changed. >>> On Jul 9, 2011 11:19 AM, "Ted Dunning" <tdunn...@maprtech.com> wrote: >>> > MapR does help with the GC because it *does* have a JNI interface into an >>> > external block cache. >>> > >>> > Typical configurations with MapR trim HBase down to the minimal viable >>> size >>> > and increase the file system cache correspondingly. >>> > >>> > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen < >>> jason.rutherg...@gmail.com >>> >> wrote: >>> > >>> >> MapR doesn't help with the GC issues. If MapR had a JNI >>> >> interface into an external block cache then that'd be a different >>> >> story. :) And I'm sure it's quite doable. >>> >> >>> >> > > >