Thanks Christoph, So it seems that tokenized has been conflated to analyzed. I just looked at the Javadocs and that's what it mentions. I had read it earlier, but it hadn't registered. I wonder why it's not called setAnalyzed. Thanks again.
On Tue, Aug 12, 2014 at 3:07 AM, Christoph Kaser < christoph.ka...@iconparc.de> wrote: > Hello Milind, > > if you don't set the field to be tokenized, no analyzer will be used and > the field's contents will be stored "as-is", i.e. case sensitive. > It's the analyzer's job to tokenize the input, so if you use an analyzer > that does not separate the input into several tokens (like the > KeywordAnalyzer), your input will remain "untokenized". > > Regards > Christoph > > Am 12.08.2014 um 03:38 schrieb Milind: > > I found the problem. But it makes no sense to me. >> >> If I set the field type to be tokenized, it works. But if I set it to not >> be tokenized the search fails. i.e. I have to pass in true to the method. >> theFieldType.setTokenized(storeTokenized); >> >> I want the field to be stored as un-tokenized. But it seems that I don't >> need to do that. The LowerCaseKeywordAnalyzer works if the field is >> tokenized, but not if it's un-tokenized! >> >> How can that be? >> >> >> On Mon, Aug 11, 2014 at 1:49 PM, Milind <mili...@gmail.com> wrote: >> >> It does look like the lowercase is working. >>> >>> The following code >>> >>> Document theDoc = theIndexReader.document(0); >>> System.out.println(theDoc.get("sn")); >>> IndexableField theField = theDoc.getField("sn"); >>> TokenStream theTokenStream = theField.tokenStream(theAnalyzer); >>> System.out.println(theTokenStream); >>> >>> produces the following output >>> SN345-B21 >>> LowerCaseFilter@5f70bea5 term=sn345-b21,bytes=[73 6e 33 34 35 2d 62 >>> 32 31],startOffset=0,endOffset=9 >>> >>> But the search does not work. Anything obvious popping out for anyone? >>> >>> >>> On Sat, Aug 9, 2014 at 4:39 PM, Milind <mili...@gmail.com> wrote: >>> >>> I looked at a couple of examples on how to get keyword analyzer to be >>>> case insensitive but I think I missed something since it's not working >>>> for >>>> me. >>>> >>>> In the code below, I'm indexing text in upper case and searching in >>>> lower >>>> case. But I get back no hits. Do I need to something more while >>>> indexing? >>>> >>>> private static class LowerCaseKeywordAnalyzer extends Analyzer >>>> { >>>> @Override >>>> protected TokenStreamComponents createComponents(String >>>> theFieldName, Reader theReader) >>>> { >>>> KeywordTokenizer theTokenizer = new >>>> KeywordTokenizer(theReader); >>>> TokenStreamComponents theTokenStreamComponents = >>>> new TokenStreamComponents( >>>> theTokenizer, >>>> new LowerCaseFilter(Version.LUCENE_46, >>>> theTokenizer)); >>>> return theTokenStreamComponents; >>>> } >>>> } >>>> >>>> private static void addDocment(IndexWriter theWriter, >>>> String theFieldName, >>>> String theValue, >>>> boolean storeTokenized) >>>> throws Exception >>>> { >>>> Document theDocument = new Document(); >>>> FieldType theFieldType = new FieldType(); >>>> theFieldType.setStored(true); >>>> theFieldType.setIndexed(true); >>>> theFieldType.setTokenized(storeTokenized); >>>> theDocument.add(new Field(theFieldName, theValue, >>>> theFieldType)); >>>> theWriter.addDocument(theDocument); >>>> } >>>> >>>> >>>> static void testLowerCaseKeywordAnalyzer() >>>> throws Exception >>>> { >>>> Version theVersion = Version.LUCENE_46; >>>> Directory theIndex = new RAMDirectory(); >>>> >>>> Analyzer theAnalyzer = new LowerCaseKeywordAnalyzer(); >>>> >>>> IndexWriterConfig theConfig = new IndexWriterConfig(theVersion, >>>> >>>> theAnalyzer); >>>> IndexWriter theWriter = new IndexWriter(theIndex, theConfig); >>>> addDocment(theWriter, "sn", "SN345-B21", false); >>>> addDocment(theWriter, "sn", "SN445-B21", false); >>>> theWriter.close(); >>>> >>>> QueryParser theParser = new QueryParser(theVersion, "sn", >>>> theAnalyzer); >>>> Query theQuery = theParser.parse("sn:sn345-b21"); >>>> IndexReader theIndexReader = DirectoryReader.open(theIndex); >>>> IndexSearcher theSearcher = new IndexSearcher(theIndexReader); >>>> TopScoreDocCollector theCollector = >>>> TopScoreDocCollector.create(10, true); >>>> theSearcher.search(theQuery, theCollector); >>>> ScoreDoc[] theHits = theCollector.topDocs().scoreDocs; >>>> System.out.println("Number of results found: " + >>>> theHits.length); >>>> } >>>> >>>> -- >>>> Regards >>>> Milind >>>> >>>> -- >>> Regards >>> Milind >>> >>> >> > > -- > ------------------------------------------------------------------------ > > Weil Individualität der beste Standard ist > > Dipl.-Inf. Christoph Kaser > > IconParc GmbH > Sophienstraße 1 > 80333 München > > iconparc.de > > Tel: +49 - 89- 15 90 06 - 21 > Fax: +49 - 89- 15 90 06 - 19 > > Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB > 121830, Amtsgericht München > > -- Regards Milind