On Nov 1, 2005, at 9:51 AM, Doug Cutting wrote:
Another approach might be to, instead of converting to UTF-8 to
strings right away, change things to convert lazily, if at all.
During index merging such conversion should never be needed.
!!
There ought to be some gains possible there, then. No predictions as
to how much, though.
You needn't do this systematically throughout Lucene, but only
where it makes a big difference. For example, if you could avoid
strings in SegmentMerger.mergeTermInfos() it might make a huge
difference. This might be as simple as changing SegmentMergeInfo
to use a TermBuffer instead of a Term. Does that make sense?
Abundant sense. I'm not as familiar with SegmentMerger as I am with
other parts of the org.apache.lucene.index package, because I haven't
ported it yet. But conceptually I understand exactly why this should
require fewer resources.
I'll take a swing at SegmentMerger and submit a comprehensive diff.
Thanks for the suggestions,
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]