Re: question about indexing/searching using standardanalyzer for KEYWORD field that contains alphanumeric data

Ian Lea Mon, 03 Aug 2009 02:22:00 -0700

Hi

Storing documentkey as TEXT will be causing it to be passed through
StandardAnalyzer which will be downcasing it, and the index will be
holding "l2222fahbhmf" rather than "L2222FAHBHMF".  When you changed
it to KEYWORD it will have been stored as is so the
updateDocument(term, doc) call will have worked but searching will
have failed because StandardAnalyzer will have downcased it. Numeric
keys will have worked everywhere because they don't get downcased.

See "Why is it important to use the same analyzer type during indexing
and search?" in the FAQ.

The best solution is probably to store it as KEYWORD and use
PerFieldAnalyzerWrapper to specify KeywordAnalyzer for documentkey.
The javadocs have an example showing what you need.

TEXT and KEYWORD haven't been around in lucene for a while.  You might
like to consider upgrading.  Good practice anyway to mention what
version you are using when asking questions.

--
Ian.

On Mon, Aug 3, 2009 at 3:49 AM, Leonard
Gestrin<[email protected]> wrote:
>
> Hello,
> I have question about KEYWORD type and searching/updating.  I am getting 
> strange behavior that I can't quite comprehend.
> My index is created using standard analyzer, which used for writing and 
> searching. It has three fields
>
> userpin - alphanumeric field which is stored as TEXT
> documentkey  - alphanumeric field which is stored as TEXT
> contents - text of document which is stored as TEXT
>
> When I try to update document I am creating Term to find document by 
> documentKey and I am using
>
>  org.apache.lucene.index.IndexWriter.updateDocument(term, pDocument);
>
> to do the update.  Lucene fails to find the document by the term and I am 
> getting duplicate documents in the index.
> When I changed index to define documentKey as KEYWORD the updates started to 
> work fine.
> However, search for documentKey using StandardAnalyzer stopped working.
>
> It appears that lucene is using keywordAnalyzer for searching for the term 
> during update, even though the indexer is open with StandardAnalyzer.
>
> The sample values that are stored in documentKeys are: "L2222FAHBHMF", 
> "L2222FAHBHAS".
> I noticed if documentKey is numeric value, both KeywordAnalyzer and 
> StandardAnalyzer can find the documents by it without any problem thus reader 
> can find and indexer can update without any problems. With alphanumeric I 
> cant get both to work.
> Any help is appreciated.
> Thanks
> Leonard
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: question about indexing/searching using standardanalyzer for KEYWORD field that contains alphanumeric data

Reply via email to