Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Michael McCandless Sat, 09 Feb 2008 07:14:41 -0800

I agree, there comes a point where the cost of added complexity is not
worth the gains, on balance.  Making that tradeoff is not easy.


I don't think the patch in LUCENE-1172 crosses that line: a 1.6% (4.1%
on small docs) top line gain is still a sizable gain.

The profiler points to many other smaller things which I think are
below the line, that I didn't pursue.

I also agree that DocumentsWriter is complex now, and I'd definitely
like to simplify it with time, hopefully without losing too much
performance.

Believe it or not, earlier versions (on LUCENE-843) were more complex,
and I pared it down before committing it.  At one point I had a
specialized segment merger that would much more efficiently merge
"partial" segments flushed from RAM.  This was actually a fairly
sizable gain (maybe ~15% overall) when building large indices.  But it
also added sizable complexity, so I took it out.  I still think this
is eventually worthwhile (especially when autoCommit=false), but it
belongs with segment merging instead (this is why I opened
LUCENE-856).

Mike

Grant Ingersoll wrote:

I also agree w/ Robert and Michael, here. While DocsWriter isreally effective, it is very complicated to follow and it makesdebugging and maintenance much harder.


-Grant

On Feb 9, 2008, at 5:03 AM, Michael Busch wrote:

robert engels wrote:

Curious... on things like this, is it really worth adding (and
maintaining) Lucene's own sort, just to achieve a 1.5 % performance

increase. It is almost doubtful that you can even measure animprovement

at that level, given all of the variables you can't control.


I somewhat agree with Robert here. The DocumentsWriter is a quite
complicated class which has already two quicksort implementations and

this patch adds even a third one. Is it really so much moreexpensive to

e. g. sort on an Object[] array and pass in a Comparator?

Don't get me wrong, I think this is very sophisticated code and it's
super fast as the performance test and also the user experiences with

2.3 proof. However, I think especially in the Open Source worldone of

our goals should be to write code that is easy to understand, so that

it's easier for new people to get on board. To find a good balanceand

trade-off between simplicity, functionality and performance is not
always easy. Of course, if a patch improves performance by say 15%, I

wouldn't hesitate to commit it. But if it's just 1% but makes thecode

more complicated I'm not so sure if it's worth it.

That being said, I wouldn't vote -1 against a patch like this one to
prevent someone from committing it, but I don't think I would
write/commit it myself. I'd just like to encourage everyone to also
think about code simplicity and readability before writing and
committing new code.

-Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Reply via email to