I found the problem. But it makes no sense to me.
If I set the field type to be tokenized, it works. But if I set it to not
be tokenized the search fails. i.e. I have to pass in true to the method.
theFieldType.setTokenized(storeTokenized);
I want the field to be stored as un-tokenized. But it seems that I don't
need to do that. The LowerCaseKeywordAnalyzer works if the field is
tokenized, but not if it's un-tokenized!
How can that be?
On Mon, Aug 11, 2014 at 1:49 PM, Milind <[email protected]> wrote:
> It does look like the lowercase is working.
>
> The following code
>
> Document theDoc = theIndexReader.document(0);
> System.out.println(theDoc.get("sn"));
> IndexableField theField = theDoc.getField("sn");
> TokenStream theTokenStream = theField.tokenStream(theAnalyzer);
> System.out.println(theTokenStream);
>
> produces the following output
> SN345-B21
> LowerCaseFilter@5f70bea5 term=sn345-b21,bytes=[73 6e 33 34 35 2d 62
> 32 31],startOffset=0,endOffset=9
>
> But the search does not work. Anything obvious popping out for anyone?
>
>
> On Sat, Aug 9, 2014 at 4:39 PM, Milind <[email protected]> wrote:
>
>> I looked at a couple of examples on how to get keyword analyzer to be
>> case insensitive but I think I missed something since it's not working for
>> me.
>>
>> In the code below, I'm indexing text in upper case and searching in lower
>> case. But I get back no hits. Do I need to something more while
>> indexing?
>>
>> private static class LowerCaseKeywordAnalyzer extends Analyzer
>> {
>> @Override
>> protected TokenStreamComponents createComponents(String
>> theFieldName, Reader theReader)
>> {
>> KeywordTokenizer theTokenizer = new
>> KeywordTokenizer(theReader);
>> TokenStreamComponents theTokenStreamComponents =
>> new TokenStreamComponents(
>> theTokenizer,
>> new LowerCaseFilter(Version.LUCENE_46,
>> theTokenizer));
>> return theTokenStreamComponents;
>> }
>> }
>>
>> private static void addDocment(IndexWriter theWriter,
>> String theFieldName,
>> String theValue,
>> boolean storeTokenized)
>> throws Exception
>> {
>> Document theDocument = new Document();
>> FieldType theFieldType = new FieldType();
>> theFieldType.setStored(true);
>> theFieldType.setIndexed(true);
>> theFieldType.setTokenized(storeTokenized);
>> theDocument.add(new Field(theFieldName, theValue,
>> theFieldType));
>> theWriter.addDocument(theDocument);
>> }
>>
>>
>> static void testLowerCaseKeywordAnalyzer()
>> throws Exception
>> {
>> Version theVersion = Version.LUCENE_46;
>> Directory theIndex = new RAMDirectory();
>>
>> Analyzer theAnalyzer = new LowerCaseKeywordAnalyzer();
>>
>> IndexWriterConfig theConfig = new IndexWriterConfig(theVersion,
>> theAnalyzer);
>> IndexWriter theWriter = new IndexWriter(theIndex, theConfig);
>> addDocment(theWriter, "sn", "SN345-B21", false);
>> addDocment(theWriter, "sn", "SN445-B21", false);
>> theWriter.close();
>>
>> QueryParser theParser = new QueryParser(theVersion, "sn",
>> theAnalyzer);
>> Query theQuery = theParser.parse("sn:sn345-b21");
>> IndexReader theIndexReader = DirectoryReader.open(theIndex);
>> IndexSearcher theSearcher = new IndexSearcher(theIndexReader);
>> TopScoreDocCollector theCollector =
>> TopScoreDocCollector.create(10, true);
>> theSearcher.search(theQuery, theCollector);
>> ScoreDoc[] theHits = theCollector.topDocs().scoreDocs;
>> System.out.println("Number of results found: " + theHits.length);
>> }
>>
>> --
>> Regards
>> Milind
>>
>
>
>
> --
> Regards
> Milind
>
--
Regards
Milind