Hi, I'm indexing names in a dedicated Lucene field and I wonder which analyzer to use for that purpose. Typically, the names are in the format "John Smith", so the WhitespaceAnalyzer is likely the best in most cases. The field type to choose seems to be the TextField. Or, would you rather recommend using the KeywordAnalyzer? I'm a bit cautious about that because I'm afraid of wildcard or regex queries such as "*Smith" or ".*Smith" respectively.
However, there might also be special cases and spelling exceptions of all kinds, e.g. "Smith, John", "John 'Hammmer' Smith", "Abd al-Aziz", "Stan van Hoop" and what else one could imagine. Is there a special Analyzer that is optimized on dealing with such cases or do I have to do normalization beforehand? I see that such special characters and spellings can easily be covered by the right queries, but that requires the user to know the exact spelling, which is what I'm trying to spare her. Best regards, Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org