I thought I could use the KeywordTokenizer to prevent tokenizing on spaces. so I can treat some fields as a single term. But it's still tokenizing on spaces.
In the code below, I'm storing a document with a serial number containing spaces. I want to treat it as a single term without having end users making it a phrase query by surrounding it with double quotes. But it doesn't work as I thought it would. Is there something I need to be doing differently? Shouldn't the keyword tokenizer treat the entire text as one token? ------------ This is the custom analyzer class I use. private static class LowerCaseKeywordAnalyzer extends Analyzer { @Override protected TokenStreamComponents createComponents(String theFieldName, Reader theReader) { Tokenizer theTokenizer = new KeywordTokenizer(theReader); TokenStream theTokenStream = new LowerCaseFilter(Version.LUCENE_46, theTokenizer); TokenStreamComponents theTokenStreamComponents = new TokenStreamComponents(theTokenizer, theTokenStream); return theTokenStreamComponents; } } The code using the analyzer Version theVersion = Version.LUCENE_46; Directory theIndex = new RAMDirectory(); Analyzer theAnalyzer = new LowerCaseKeywordAnalyzer(); IndexWriterConfig theConfig = new IndexWriterConfig(theVersion, theAnalyzer); IndexWriter theWriter = new IndexWriter(theIndex, theConfig); Document theDocument = new Document(); FieldType theFieldType = new FieldType(); theFieldType.setStored(true); theFieldType.setIndexed(true); theFieldType.setTokenized(false); theDocument.add(new Field("sn", "1023 4567 8765", theFieldType)); theWriter.addDocument(theDocument); theWriter.close(); String[] theQueryStrings = new String[] { "\"1023 4567 8765\"", "1023 4567 8765" }; QueryParser theParser = new QueryParser(theVersion, "sn", theAnalyzer); IndexReader theIndexReader = DirectoryReader.open(theIndex); IndexSearcher theSearcher = new IndexSearcher(theIndexReader); for (int i = 0; i < theQueryStrings.length; i++) { String currQueryStr = theQueryStrings[i]; Query currQuery = theParser.parse("sn:" + currQueryStr); System.out.println(currQuery.getClass() + ", " + currQuery); TopScoreDocCollector currCollector = TopScoreDocCollector.create(10, true); theSearcher.search(currQuery, currCollector); ScoreDoc[] currHits = currCollector.topDocs().scoreDocs; String msg = "Number of results found for '" + currQueryStr + "': " + currHits.length; System.out.println(msg); } The output class org.apache.lucene.search.TermQuery, sn:1023 4567 8765 Number of results found for '"1023 4567 8765"': 1 class org.apache.lucene.search.BooleanQuery, sn:1023 sn:4567 sn:8765 Number of results found for '1023 4567 8765': 0 -- Regards Milind