Specialized Analyzer for names

Carsten Schnober Fri, 23 Nov 2012 06:37:33 -0800

Hi,
I'm indexing names in a dedicated Lucene field and I wonder which
analyzer to use for that purpose. Typically, the names are in the format
"John Smith", so the WhitespaceAnalyzer is likely the best in most
cases. The field type to choose seems to be the TextField.
Or, would you rather recommend using the KeywordAnalyzer? I'm a bit
cautious about that because I'm afraid of wildcard or regex queries such
as "*Smith" or ".*Smith" respectively.


However, there might also be special cases and spelling exceptions of
all kinds, e.g. "Smith, John", "John 'Hammmer' Smith", "Abd al-Aziz",
"Stan van Hoop" and what else one could imagine. Is there a special
Analyzer that is optimized on dealing with such cases or do I have to do
normalization beforehand?
I see that such special characters and spellings can easily be covered
by the right queries, but that requires the user to know the exact
spelling, which is what I'm trying to spare her.

Best regards,
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | [email protected]
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Specialized Analyzer for names

Reply via email to