I agree, there comes a point where the cost of added complexity is not
worth the gains, on balance. Making that tradeoff is not easy.
I don't think the patch in LUCENE-1172 crosses that line: a 1.6% (4.1%
on small docs) top line gain is still a sizable gain.
The profiler points to many other smaller things which I think are
below the line, that I didn't pursue.
I also agree that DocumentsWriter is complex now, and I'd definitely
like to simplify it with time, hopefully without losing too much
performance.
Believe it or not, earlier versions (on LUCENE-843) were more complex,
and I pared it down before committing it. At one point I had a
specialized segment merger that would much more efficiently merge
"partial" segments flushed from RAM. This was actually a fairly
sizable gain (maybe ~15% overall) when building large indices. But it
also added sizable complexity, so I took it out. I still think this
is eventually worthwhile (especially when autoCommit=false), but it
belongs with segment merging instead (this is why I opened
LUCENE-856).
Mike
Grant Ingersoll wrote:
I also agree w/ Robert and Michael, here. While DocsWriter is
really effective, it is very complicated to follow and it makes
debugging and maintenance much harder.
-Grant
On Feb 9, 2008, at 5:03 AM, Michael Busch wrote:
robert engels wrote:
Curious... on things like this, is it really worth adding (and
maintaining) Lucene's own sort, just to achieve a 1.5 % performance
increase. It is almost doubtful that you can even measure an
improvement
at that level, given all of the variables you can't control.
I somewhat agree with Robert here. The DocumentsWriter is a quite
complicated class which has already two quicksort implementations and
this patch adds even a third one. Is it really so much more
expensive to
e. g. sort on an Object[] array and pass in a Comparator?
Don't get me wrong, I think this is very sophisticated code and it's
super fast as the performance test and also the user experiences with
2.3 proof. However, I think especially in the Open Source world
one of
our goals should be to write code that is easy to understand, so that
it's easier for new people to get on board. To find a good balance
and
trade-off between simplicity, functionality and performance is not
always easy. Of course, if a patch improves performance by say 15%, I
wouldn't hesitate to commit it. But if it's just 1% but makes the
code
more complicated I'm not so sure if it's worth it.
That being said, I wouldn't vote -1 against a patch like this one to
prevent someone from committing it, but I don't think I would
write/commit it myself. I'd just like to encourage everyone to also
think about code simplicity and readability before writing and
committing new code.
-Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]