Do you use the XML only during indexing? If so, you could bypass the whole conversion to XML and then back through Digester all within an analyzer.
Or am I missing something that prevents you from doing it this way?
Erik
On Feb 28, 2004, at 10:05 PM, Roy Klein wrote:
Erik, Here's a brief example of the type of thing I'm trying to do:
I have a file that contains the words:
The quick brown fox jumped over the lazy dog.
I run that file through a utility that produces the following xml document: <document> <field name=wordposition1> <word>The</word> </field> <field name=wordposition2> <word>quick</word> <word>fast</word> <word>speedy</word> </field> <field name=wordposition3> <word>brown</word> <word>tan</word> <word>dark</word> </field> . . .
I parse that document (via the digester), and add all the words from each of the fields to one lucene field: "contents". The tricky part is that I want to have each word position contain all the words at that position in the lucene index. I.e. word location 1 in the index contains "The", word location 2: "quick, fast, and speedy", word location 3: "brown, tan, and dark", etc.
That way, all the following phrase queries will match this document: "fast tan" "quick brown" "fast brown"
I wrote a "TermAnalyzer" that adds all the words from a field into the
index at the same position. (via setPositionIncrement(0)). That way I
can simply add each set of words to the "contents" field, and it'll just
keep adding them to the same field. However, since it's reversing them,
I can't match phrases.
Roy
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
