Cheers!

I know these are (mostly) small micro-optimizations, but I am
primarily trying to demonstrate that I *can* code, and am interested
in working more closely on H2.  I wanted to check things out before
working on a more complex feature and having it be rejected.  If there
is a specific roadmap item which would make a better contained test,
I'd be glad to work on something else.

In general, I would prefer to get a good, free profiler working with
H2's benchmark and performance tests, rather than submitting a lot of
limited micro-benchmarks.  Writing a lot of microbenchmarks that
reflect real use cases is trickier and little more accurate than
profiling.  I think profiling would provide richer information about
performance on different systems.  Unfortunately, I'm wrestling with
H2's build system in both Eclipse and Netbeans, so I can't offer this
information until I figure out the problems there.

These optimizations are based on using the Netbeans profiler to
measure hotspots when adding & removing indices from one of my
databases.

On Aug 3, 3:12 pm, Thomas Mueller <[email protected]>
wrote:
> I know - that's one of the reasons I am working on the 'page store' mechanism.
I see the commits in the changes listing, and the switch to pages over
block records.  I'd like to help work on a more fundamental
optimizations of the allocation and I/O systems; but, I didn't want to
embark on something larger until seeing how you feel about the smaller
work.

> > getAddress
> I think this method gets inlined by the JVM, so manually inlining will
> not help. But I may be wrong.
On Hotspot it should inline rapidly, so any performance gains will be
small, but I think manual inlining is still slightly faster.
I chose to manually inline because other JVMs may not optimize as
intelligently, and the BitField methods get called a lot, so any
improvements are worth it.

I know it hurts readability of the code a bit, but there's already
plenty of bit-twiddling in the class, so I think readability may not
be at a premium here.

> > Improved performance on DiskFile.initFromSummary / getSummary -- in one 
> > test, this is 4x faster
> Could you submit this test? I'm not sure how your changes could help,
> unless getAddress is not inlined.
The test was a CREATE INDEX performance test run on my own database,
and profiled with the Netbeans profiler.  With a little coding, I can
provide a more generic test case, but I think if you examine the
unpatched version r1674 closer it will be obvious why the change is
faster.

Unpatched, getSummary reads a BitField a bit at a time (with all the
safety checks and binary ops to get the bit).  The patch uses your
getByte method in place of 8 of those bit calls.  One set of checks,
and one array access per 8 bits is obviously faster than 8 of them.  I
could write a version that returns an aligned long at a time, which
would be even faster, but that seemed a little excessive.

> > setRange(final int start, final int len, final boolean value) {
> I don't want to use 'final' for parameters and local variables because
> I think it clutters the code. Unless it is faster of course.
Declaring final used to be a bit faster, and should allow for
additional optimizations because the compiler knows it is immutable.
The JIT compiler may or may not be smart enough to figure this one out
without declaring final -- I don't have strong feelings about this, go
with whatever you prefer.

> -    private long getBitMask(int i) {
> +    private static long getBitMask(int i) {
>
Oops.  I didn't realize that was still in there -- kind of pointless
since I only kept the method for reference.
This used to encourage the compiler to inline the method and optimize
more, but doesn't make much of a difference now, supposedly.

> > setRange
> Do you have some statistics that show how many time it is called with
> which len? I like to avoid optimizing this method if it doesn't
> improve performance. Do you have a micro-benchmark for your changes? I
> like to compare how much the changes improved performance.
As said above, I prefer profiling to micro-benchmarks where possible.
In profiling, BitField.setRange went from 908 ms to 240 ms for the
same operations, otherwise same settings, on the same data.  I used
settings that should minimize the profiling overhead for numerous
calls to fast methods -- only the call count is tracked, not runtime,
unless method takes >3 ms.

I can't give you precise numbers for the length of all calls to
setRange, but I can give you the totals. BitField.setRange was called
about 1.4 million times.  Without the patch, it called BitField.set
(int,boolean) approximately 10 million times.   So, we can assume that
even with small ranges, this method is pretty fast.  I could probably
improve that further.

If small-range performance is weak, you've still got the option to use
BitField.set(int) and BitField.clear(int) when only setting a couple
bits -- performance should be identical to using the old setRange
there.
>
> > Before patch :  23275  and 31407 statement/sec
> > After patch: 23564 and 32854 statement/sec
>
> That could be a random difference (noise).
Oh, a fair bit probably is noise.  It's hard to tell without either
profiling or micro-benchmarks.  These sorts of numbers were pretty
consistent though.

Regards,
Sam Van Oort
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to