[hypertable-dev] Re: CellCache allocation policy

Doug Judd Mon, 03 Nov 2008 16:19:44 -0800

Hi Alex,

There was, indeed, a performance regression.  Essentially, readahead was
being enabled, even in random reads such as the random read benchmark.  I've
checked in a fix and have pushed it out to the main repository.  Feel free
to build from this latest
source<http://code.google.com/p/hypertable/wiki/SourceCode?tm=4>.
I'd be curious to see the numbers you get with this new code.


It's a little difficult to do apples-to-apples comparison with the Bigtable
benchmark, considering that they use a 1786 node GFS cell.  But here are the
numbers that I now get when running the software on a single node (dual-core
1.8 GHz Opteron) using the localBroker with the local filesystem:

*not* IN_MEMORY:

./bin/random_write_test 1000000000

  Elapsed time:  72.28 s
 Total inserts:  1000000
    Throughput:  14002073.08 bytes/s
    Throughput:  13836.04 inserts/s

./bin/random_read_test 1000000000

  Elapsed time:  795.86 s
 Total scanned:  1000000
    Throughput:  1271582.90 bytes/s
    Throughput:  1256.50 scanned cells/s

IN_MEMORY:

./bin/random_write_test 1000000000

  Elapsed time:  65.71 s
 Total inserts:  1000000
    Throughput:  15401040.04 bytes/s
    Throughput:  15218.42 inserts/s

./bin/random_read_test 1000000000

  Elapsed time:  192.84 s
 Total scanned:  1000000
    Throughput:  5247769.33 bytes/s
    Throughput:  5185.54 scanned cells/s

Not sure why the random read IN_MEMORY test is 1/2 of the Bigtable number.
I didn't see any low hanging fruit with oprofile.  As soon as we get all of
the functionality in place for beta, we'll do a performance push and figure
out what's going on.

- Doug

On Tue, Oct 21, 2008 at 11:27 AM, Alex <[EMAIL PROTECTED]> wrote:

>
> Joshua, Thanks for your note! Indeed I was using debug build b/c I am
> interested in memory profile. However, when I re-built for release, it
> didn't have a huge effect. However, I realized that the slowdown was
> due to the fact that I was running HT with tcmalloc heap profiling.
>
> The results are significantly better for IN MEMORY random tests:
>
> random_write_test 1000000000
>
>  Elapsed time:  62.39 s
>  Total inserts:  1000000
>    Throughput:  16221685.94 bytes/s
>    Throughput:  16029.33 inserts/s
>
> random_read_test 1000000000
>
>  Elapsed time:  197.01 s
>  Total scanned:  1000000
>    Throughput:  5136732.38 bytes/s
>    Throughput:  5075.82 scanned cells/s
>
> So, in memory random read test is only ~2x slower than the number
> reported in BigTable paper (although I guess their machine was
> somewhat slower 2 years ago).
>
> Without in memory setting, random read test is also ~2x slower:
>
> random_write_test 1000000000
>
>  Elapsed time:  65.15 s
>  Total inserts:  1000000
>    Throughput:  15533951.97 bytes/s
>    Throughput:  15349.75 inserts/s
>
> random_read_test 1000000000
>
>  Elapsed time:  1256.39 s
>  Total scanned:  1000000
>    Throughput:  805479.86 bytes/s
>    Throughput:  795.93 scanned cells/s
>
> Alex
>
> On Oct 20, 6:59 pm, "Joshua Taylor" <[EMAIL PROTECTED]> wrote:
> > Alex, just making sure... you're running the optimized build, right?  The
> > debug build (which I believe is the default setup) is way slower than the
> > optimized build.
> >
> > Josh
> >
> > On Mon, Oct 20, 2008 at 6:48 PM, Alex <[EMAIL PROTECTED]> wrote:
> >
> > > Luke,
> >
> > > I tried to set IN MEMORY access group option for RandomTest table.
> >
> > > The result for random read test improved by ~5x:
> >
> > > random_read_test 1000000000
> >
> > > 0%   10   20   30   40   50   60   70   80   90   100%
> > > |----|----|----|----|----|----|----|----|----|----|
> > > ***************************************************
> > >  Elapsed time:  1971.06 s
> > >  Total scanned:  1000000
> > >    Throughput:  513430.49 bytes/s
> > >    Throughput:  507.34 scanned cells/s
> >
> > > However, it is still ~2x worse than the number from BigTable paper for
> > > non-mem tables and ~20x worse than mem-table.
> >
> > > Thanks,
> > > Alex
> >
> > > On Oct 19, 2:18 pm, Luke <[EMAIL PROTECTED]> wrote:
> > > > The random_read_test used to score 4k qps for comparable benchmark
> (vs
> > > > 1.2k qps in the bigtable paper, note the 10k qps number is for
> > > > memtable, which is very different from regular table. In hypertable
> > > > you can use the IN MEMORY access group option to get memtable) The
> > > > regular table scanner needs to merge scan cell cache, cell stores, so
> > > > it's much more expensive than memtable scanner which just scan the
> > > > cellcache, regardless the size of the table.
> >
> > > > Doug pushed out .11, which contains major changes to the way
> cellcache
> > > > and compaction works before he went on a vacation. There might be a
> > > > performance regression in recent releases. Thanks for the note, we'll
> > > > look into it.
> >
> > > > __Luke
> >
> > > > On Oct 16, 9:59 pm, Alex <[EMAIL PROTECTED]> wrote:
> >
> > > > > Hi All,
> >
> > > > > Could somebody explain/describe CellCache allocation/eviction
> policy?
> >
> > > > > After running random read/write tests, I came to the conclusion
> that
> > > > > CellCache operation is very different from what is described in
> Google
> > > > > BigTable paper, i.e. CellCache works as a write buffer rather than
> a
> > > > > cache. CellCache seems to help a lot for optimizing writes but it
> > > > > doesn't help reads.
> >
> > > > > Here are the results:
> >
> > > > > random_write_test 100000000
> >
> > > > >   Elapsed time:  8.16 s
> > > > >  Total inserts:  100000
> > > > >     Throughput:  12403559.87 bytes/s
> > > > >     Throughput:  12256.48 inserts/s
> >
> > > > > random_read_test 100000000
> >
> > > > >   Elapsed time:  1038.47 s
> > > > >  Total scanned:  100000
> > > > >     Throughput:  97451.43 bytes/s
> > > > >     Throughput:  96.30 scanned cells/s
> >
> > > > > Random read speed is ~100x slower than the result in Google
> BigTable
> > > > > for random read test which fits in the memory. In this case the
> data
> > > > > set size should be ~100MB and should comfortably fit in the DRAM
> > > > > (8GB).
> >
> > > > > Also, tcmalloc heap profiling shows that the usage memory actually
> > > > > decreases to ~50MB during random read test while it is >700MB
> during
> > > > > random write test (although top instead shows increase).
> >
> > > > > I apologize if I am missing something very basic, I have very
> little
> > > > > experience in this area.
> >
> > > > > Thanks,
> > > > > Alex
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[hypertable-dev] Re: CellCache allocation policy

Reply via email to