Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Mike Klaas Sun, 10 Feb 2008 23:08:10 -0800

Certainly others do agree with you to some degree that this case ison the cost/benefit borderline. Again, this case wasn't really thepoint.

My point was it feels to me that you have, on occasion, been over-quick to criticize without paying sufficiently respectful attentionto the details of what is being discussed. For instance, thecriticism of "these tests should be done on a *nix platform" tosomeone who has repeated the tests on osx (yes, a nix) and windows.Or that the test is too short and the index in memory (it was 10MMdocs with term vecs on FSDirectory. It is possible that some of theindex wasn't fsync'd at the end of each test, I suppose, but I wouldexpect this to be a small amount and equivalent in pre- and post-patch scenarios). Or calling a full index run of 10MM docs a "microbenchmark".

I do think that I was unchill in sending the original post to thelist instead of to you via personal mail. I shouldn't have.


regards,
-Mike

On 10-Feb-08, at 7:33 PM, robert engels wrote:

Please chill. You are inferring something that was not implied. Youmay think it lacks perspective and respect (I disagree on both),but it certainly doesn't lack in correctness.
First, depending on how you measure it, 2x speedup equates to a 50%reduction in time. In my review of the changes that brought aboutthe biggest performance gains from 1.9 on, almost all were relatedto avoiding disk accesses by buffering more documents and doingmore processing in memory. I don't think many of the micro-benchmarks mattered much, and with a JVM environment it is verydifficult to prove as it is going to be heavily JVM andconfiguration dependent.
The main point was that ANY disk access is going to be ORDERS OFMAGNITUDE slower than any of these sort of optimizations.
So either you are loading the index completely in memory (onlysmall indexes, so the difference in speed is not going to mattermuch), or you might be using a federated system of memory indices(to form a large index), but USUALLY at some point the index mustbe first created in a persistent store (that which is coveredhere), in order to provide realistic restart times, etc.
The author of the patch and timings gives no information as to diskspeed, IO speed, controllers, raid configuration , etc. Whencreating an index in persistent store, these factors matter morethan a 2-4% speed up. Creating an index completely in memory isthen bound by the reading of the data from the disk, and/or thenetwork - all much slower than the actual indexing.
Usually optimizations like this only matter in areas of developmentwhere the data set is small, but the processing large (a lot ofnumerical analysis). In some cases the data set may also be"large", but then usually the processing is exponentially larger.The building of the index in Lucene in not very computationallyexpensive.
If you are going spend hundreds of hours "optimizing", you best beoptimizing the right things. That was the point of the link I sent(the quotes are from people far more capable than I).
I was trying to make the point that a 2-4 % speed up probablydoesn't amount to much in a real environment given all of the otherfactors, so it is probably better for the project/community to erron the side of code clarity and ease of maintenance.
The project can continue to do what it wants (obviously) - but whatI was pointing out should be nothing new to experienced designers/developers - I only offering a reminder. It is my observation(others will disagree !), but I think a lot of Lucene has someunneeded esoteric code, where the benefit doesn't match the cost.
On Feb 10, 2008, at 5:48 PM, Mike Klaas wrote:
While I agree in general that excessive optimization at theexpense of code clarity is undesirable, you are overstating thepoint. 2X is a ridiculous threshold to apply to something asperformance critical as a full text search engine. If search wastwice as slow, lucene would be utterly unusable for me. Indexingless important than search, of course, but a 2X slowdown with bequite painful there.
I don't have an opinion in this case: I believe that there is atradeoff but that it is the responsibility of the commiter(s) toachieve the correct balance--they are the ones who will bemaintaining the code, after all. I find your persistencesurprising and your tone dangerously near condescending. Tellingthe guy who has spent hundreds of hours carefully optimizing thiscode that "Almost always there is a better bottleneck somewhere"shows an astonishing lack of perspective and respect.
-Mike

On 10-Feb-08, at 12:15 PM, robert engels wrote:
I am not sure these numbers matter. I think they are skewedbecause you are probably running too short a test, and the indexis in memory (or OS cache).
Once you use a real index that needs to read/write from the disk,the percentage change will be negligible.
This is the problem with many of these "performance changes" -they just aren't real world enough. Even if they were, I wouldargue that code simplicity/maintainability is worth more than 6seconds on a operation that takes 4 minutes to run...
There are many people that believe micro benchmarks are next toworthless. A good rule of thumb is that if the optimizationdoesn't result in 2x speedup, it probably shouldn't be done. Inmost cases any efficiency gains are later lost in maintainabilityissues.
See http://en.wikipedia.org/wiki/Optimization_(computer_science)

Almost always there is a better bottleneck somewhere.

On Feb 10, 2008, at 1:37 PM, Michael McCandless wrote:
Yonik Seeley wrote:
I wonder how well a single generic quickSort(Object[] arr, intlow,int high) would perform vs the type-specific ones? I guess themainoverhead would be a cast from Object to the specific class todo the
compare?  Too bad Java doesn't have true generics/templates.
OK I tested this.
Starting from the patch on LUCENE-1172, which has 3 quickSortmethods(one per type), I created a single quickSort method on Object[]that
takes a Comparator, and made 3 Comparators instead.

Mac OS X 10.4 (JVM 1.5):

    original patch --> 247.1
  simplified patch --> 254.9 (3.2% slower)

Windows Server 2003 R64 (JVM 1.6):

    original patch --> 440.6
  simplified patch --> 452.7 (2.7% slower)
The times are best in 10 runs. I'm running all tests with theseJVM
args:

  -Xms1024M -Xmx1024M -Xbatch -server

I think this is a big enough difference in performance that it's
worth keeping 3 separate quickSorts in DocumentsWriter.

Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Reply via email to