Yonik,
Thanks for your carefully thought out and detailed reply.
On 20 Nov 2005, at 12:00, Yonik Seeley wrote:
Does it make sense to add an IndexWriter setting to
specify a default position increment gap to use when multiple fields
are added in this way?
Per-field might be nice...
The good news is that Analyzer is an abstract class, and not an
Interface, so we could add something to it without breaking existing
analyzers. (a benefits of classes over interfaces that rarely get
mentioned).
int Analyzer.getPositionIncrementGap(String field)
or getMultiValuedFieldGap(String field)
What about adding an offset to Field, setPositionOffset(int offset)?
Looking at DocumentWriter, it looks like this would be the simplest
thing that could work, without precluding the interesting option of
modifying Analyzer to allow with flags on tokenStream.
Modifying Analyzer as you have suggested would require DocumentWriter
additionally keep track of the field names and note when one is used
again, but having Field specify an offset would eliminate the need
for such tracking.
But what might be even more powerful is to leave everything up to the
analyzer, where you could choose to do a big position increment,
generate a special token, or anything else one might think of.
You can't do this right now in the analyzer because of a lack of info
(you don't know if you are on the first field or a subsequent one.
One could always add a big position increment at the start of every
field, but I suspect that would blow up the index size. Another way
is to give more context info to the Analyzer:
Analyzer.analyzer.tokenStream(fieldName, reader, flags)
where one of the flags could be REPEATED_FIELD or something.
I like this idea. But perhaps the Field.setPositionOffset(int
offset) is a lighter-weight first start.
Thoughts?
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]