Re: Indexing multiple instances of the same field for each document

Markus Spath Sun, 29 Feb 2004 02:45:29 -0800

Roy Klein wrote:

Erik,

Indexing a single field in chunks solves a design problem I'm working
on. It's not the only way to do it, but, it would certainly be the most
straightforward.  However, if using this method makes phrase searching
unusable, then I'll have to go another route.

hmm, wouldn't it be easier to index only one term for a list of synomys instead of indexing each synonym for one term?

quick, fast, speedy -> quick (both when building the index and building the query)

this also would solve your problems with the (somehow counterintuative but probably well reasoned) behaviour of lucene to add Fields with the same name at the beginning instead of appending them.

Markus

Here's a brief example of the type of thing I'm trying to do:

I have a file that contains the words:

The quick brown fox jumped over the lazy dog.

I run that file through a utility that produces the following xml
document:
<document>
  <field name=wordposition1>
    <word>The</word>
  </field>
  <field name=wordposition2>
    <word>quick</word>
    <word>fast</word>
    <word>speedy</word>
  </field>
  <field name=wordposition3>
    <word>brown</word>
    <word>tan</word>
    <word>dark</word>
  </field>
  .
  .
  .

I parse that document (via the digester), and add all the words from
each of the fields to one lucene field: "contents".  The tricky part is
that I want to have each word position contain all the words at that
position in the lucene index.  I.e. word location 1 in the index
contains "The", word location 2: "quick, fast, and speedy", word
location 3: "brown, tan, and dark", etc.

That way, all the following phrase queries will match this document:
        "fast tan"
        "quick brown"
      "fast brown"


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing multiple instances of the same field for each document

Reply via email to