Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

robert engels Mon, 11 Feb 2008 00:55:37 -0800

I am not disputing that there is a speed improvement. I am disputingthat the performance gain of many of these patches is not worth theadditional complexity in the code. Clear code will allow for moreradical improvements as more eyes will be able to easily understandthe inner workings and offer better algorithms, not just microimprovements that the JVM (eventually) can probably figure out on itsown.

It is a value judgement, and regretfully I don't have another 30years to pass down the full knowledge behind my reasoning.

Luckily, however, there are some very good books available on thesubject...

It's not the fault of the submitter, but many of these timings aresuspect due to difficulty in measuring the improvements accurately.


Here is a simple example:

You can configure the JVM to not perform aggressive garbagecollection, and write a program that generates a lot garbage - but itruns very fast (not GCing), until the GC eventually occurs (if theprogram runs long enough). It may be overall much slower than analternative that runs slower as it executes, but has code to managethe objects as they are created, and rarely if ever hits a GC cycle.But then, the JVM (e.g. generational GC) can implement improvementsthat makes choice A faster (and the better choice)... and the cyclecontinues...

Without detailed timings and other metrics (GC pauses, IO, memoryutilization, native compilation, etc.) most benchmarks are not veryaccurate or useful. There are a lot of variables to consider - maybemore so than can reasonably be considered. That is why a 4% gain ishighly suspect. If the gain was 25%, or 50% or 100%, you have abetter chance of it being an innate improvement, and not just theinteraction of some other factors.


On Feb 11, 2008, at 2:32 AM, eks dev wrote:

Robert,

you may or may not be right, I do not know. The only way to proveit would be to show you can do it better, no?If you are so convinced this is wrong, you could, much better thanquoting textbooks:

a) write better patch, get attention with something you think is"better bottleneck"b) provide realistic "performance tests" as you dispute themeasurement provided here

It has to be that concrete, academic discussions are cool, but atthe end of a day, it is the code that executes that counts.


cheers,
eks

----- Original Message ----
From: robert engels <[EMAIL PROTECTED]>
To: [email protected]
Sent: Sunday, 10 February, 2008 9:15:30 PM

Subject: Re: [jira] Created: (LUCENE-1172) Small speedups toDocumentsWriter


I am not sure these numbers matter. I think they are skewed because
you are probably running too short a test, and the index is in memory
(or OS cache).

Once you use a real index that needs to read/write from the disk, the
percentage change will be negligible.

This is the problem with many of these "performance changes" - they
just aren't real world enough.  Even if they were, I would argue that
code simplicity/maintainability is worth more than 6 seconds on a
operation that takes 4 minutes to run...

There are many people that believe micro benchmarks are next to
worthless. A good rule of thumb is that if the optimization doesn't
result in 2x speedup, it probably shouldn't be done. In most cases
any efficiency gains are later lost in maintainability issues.

See http://en.wikipedia.org/wiki/Optimization_(computer_science)

Almost always there is a better bottleneck somewhere.

On Feb 10, 2008, at 1:37 PM, Michael McCandless wrote:


Yonik Seeley wrote:

I wonder how well a single generic quickSort(Object[] arr, int low,
int high) would perform vs the type-specific ones?  I guess the main
overhead would be a cast from Object to the specific class to do the
compare?  Too bad Java doesn't have true generics/templates.



OK I tested this.

Starting from the patch on LUCENE-1172, which has 3 quickSort methods
(one per type), I created a single quickSort method on Object[] that
takes a Comparator, and made 3 Comparators instead.

Mac OS X 10.4 (JVM 1.5):

    original patch --> 247.1
  simplified patch --> 254.9 (3.2% slower)

Windows Server 2003 R64 (JVM 1.6):

    original patch --> 440.6
  simplified patch --> 452.7 (2.7% slower)

The times are best in 10 runs.  I'm running all tests with these JVM
args:

  -Xms1024M -Xmx1024M -Xbatch -server

I think this is a big enough difference in performance that it's
worth keeping 3 separate quickSorts in DocumentsWriter.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






      __________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Reply via email to