Cheers! I know these are (mostly) small micro-optimizations, but I am primarily trying to demonstrate that I *can* code, and am interested in working more closely on H2. I wanted to check things out before working on a more complex feature and having it be rejected. If there is a specific roadmap item which would make a better contained test, I'd be glad to work on something else.
In general, I would prefer to get a good, free profiler working with H2's benchmark and performance tests, rather than submitting a lot of limited micro-benchmarks. Writing a lot of microbenchmarks that reflect real use cases is trickier and little more accurate than profiling. I think profiling would provide richer information about performance on different systems. Unfortunately, I'm wrestling with H2's build system in both Eclipse and Netbeans, so I can't offer this information until I figure out the problems there. These optimizations are based on using the Netbeans profiler to measure hotspots when adding & removing indices from one of my databases. On Aug 3, 3:12 pm, Thomas Mueller <[email protected]> wrote: > I know - that's one of the reasons I am working on the 'page store' mechanism. I see the commits in the changes listing, and the switch to pages over block records. I'd like to help work on a more fundamental optimizations of the allocation and I/O systems; but, I didn't want to embark on something larger until seeing how you feel about the smaller work. > > getAddress > I think this method gets inlined by the JVM, so manually inlining will > not help. But I may be wrong. On Hotspot it should inline rapidly, so any performance gains will be small, but I think manual inlining is still slightly faster. I chose to manually inline because other JVMs may not optimize as intelligently, and the BitField methods get called a lot, so any improvements are worth it. I know it hurts readability of the code a bit, but there's already plenty of bit-twiddling in the class, so I think readability may not be at a premium here. > > Improved performance on DiskFile.initFromSummary / getSummary -- in one > > test, this is 4x faster > Could you submit this test? I'm not sure how your changes could help, > unless getAddress is not inlined. The test was a CREATE INDEX performance test run on my own database, and profiled with the Netbeans profiler. With a little coding, I can provide a more generic test case, but I think if you examine the unpatched version r1674 closer it will be obvious why the change is faster. Unpatched, getSummary reads a BitField a bit at a time (with all the safety checks and binary ops to get the bit). The patch uses your getByte method in place of 8 of those bit calls. One set of checks, and one array access per 8 bits is obviously faster than 8 of them. I could write a version that returns an aligned long at a time, which would be even faster, but that seemed a little excessive. > > setRange(final int start, final int len, final boolean value) { > I don't want to use 'final' for parameters and local variables because > I think it clutters the code. Unless it is faster of course. Declaring final used to be a bit faster, and should allow for additional optimizations because the compiler knows it is immutable. The JIT compiler may or may not be smart enough to figure this one out without declaring final -- I don't have strong feelings about this, go with whatever you prefer. > - private long getBitMask(int i) { > + private static long getBitMask(int i) { > Oops. I didn't realize that was still in there -- kind of pointless since I only kept the method for reference. This used to encourage the compiler to inline the method and optimize more, but doesn't make much of a difference now, supposedly. > > setRange > Do you have some statistics that show how many time it is called with > which len? I like to avoid optimizing this method if it doesn't > improve performance. Do you have a micro-benchmark for your changes? I > like to compare how much the changes improved performance. As said above, I prefer profiling to micro-benchmarks where possible. In profiling, BitField.setRange went from 908 ms to 240 ms for the same operations, otherwise same settings, on the same data. I used settings that should minimize the profiling overhead for numerous calls to fast methods -- only the call count is tracked, not runtime, unless method takes >3 ms. I can't give you precise numbers for the length of all calls to setRange, but I can give you the totals. BitField.setRange was called about 1.4 million times. Without the patch, it called BitField.set (int,boolean) approximately 10 million times. So, we can assume that even with small ranges, this method is pretty fast. I could probably improve that further. If small-range performance is weak, you've still got the option to use BitField.set(int) and BitField.clear(int) when only setting a couple bits -- performance should be identical to using the old setRange there. > > > Before patch : 23275 and 31407 statement/sec > > After patch: 23564 and 32854 statement/sec > > That could be a random difference (noise). Oh, a fair bit probably is noise. It's hard to tell without either profiling or micro-benchmarks. These sorts of numbers were pretty consistent though. Regards, Sam Van Oort --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "H2 Database" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/h2-database?hl=en -~----------~----~----~----~------~----~------~--~---
