In Java Lucene, StandardAnalyzer lowercases (I can only speak to Java Lucene though).
Stored values are _never_ affected by analysis though. What goes in is what gets stored. Analysis is a complete different step and location in the index. Erik On Mar 7, 2011, at 12:10 , Trevor Watson wrote: > Thanks for the response! > > We changed our code so we always use a StandardAnalyzer, which as far as I > know should always use a LowerCaseFilter when an IndexWriter writes to an > index. However, using Luke shows that this isn't the case. > > Does the StandardAnalyzer use a LowerCaseFilter? > Should it be stored in the index without capitalization? > Is there a way to force it to make all data lower case without using C#'s > ToLower()? Would it be best to write an analyzer that extends > StandardAnalyzer to use a ToLower? > > Thanks in advance. > > Trevor > > On 03/04/2011 7:12 PM, Digy wrote: >> Hi Trevor, >> >> Lucene.Net is intented to be a deterministic code :) So "NOT ALWAYS" or >> "USUALLY" should mean a bug either in Lucene.Net or in your code. I would >> recommend to revise your code and use Luke (http://www.getopt.org/luke/) to >> inspect your index in order to see what you have in it. >> >> DIGY >> >> PS: Don't try to make searches on an index created with a different >> analyzer. >> >> >> -----Original Message----- >> From: Trevor Watson [mailto:trevor.wat...@gmail.com] >> Sent: Friday, March 04, 2011 9:04 PM >> To: lucene-net-user@lucene.apache.org >> Subject: [Lucene.Net] StandardAnalyzer and lowercase >> >> I currently have a project that indexes multiple file formats. There is a >> 2nd index that I use to keep track of files (because the queries in the >> database are too slow, we query an index and use an ID field to get the >> stuff out of the database) >> >> However, I've started to run into some issues with the StandardAnalyzer. We >> were using different analyzers at one point, so moved all creations of an >> anaylzer to this function >> >> public static Analyzer getAnalyzer() >> { >> Hashtable htStopWords = new Hashtable(); >> Analyzer analyzer = new >> StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29, htStopWords); >> >> return analyzer; >> } >> >> So now all functions should now be using a StandardAnalyzer. >> >> It is to my knowledge that a StandardAnalyzer uses a LowerCaseFilter to >> change all strings to a lower-case string and in some cases that is true. >> To get all documents in an index, we use a field called SearchAll and store >> the word "SearchAll" into the index, then search for that. >> >> Creation of the document to write is done in this function >> >> public Document getFileInfoDoc() >> { >> Document doc = new Document(); >> doc.Add(new Field('FieldId", this.FieldID, Field.Store.YES, >> Field.Index.NOT_ANALYZED)); >> doc.Add(new Field("SelectAll", "SelectAll", Field.Store.NO, >> Field.Index.ANALYZED)); >> doc.Add(new Field("FilePath", this.FilePath, Field.Store.YES, >> Field.Index.ANALYZED)); >> >> return doc; >> } >> >> In one case we call this code >> >> Document doc = getFileInfoDoc(); >> Analyzer analyzer = getAnalyzer(); >> indexWriter.UpdateDocument(new Term("FileId", this.FileId.ToString()), doc, >> analyzer); >> >> This code writes to the indexWriter, but DOES NOT ALWAYS apply the >> LowerCaseFilter to the string stored in SelectAll. >> >> To rebuild the index, we DeleteAllDocs from the index and loop through each >> file to be stored, we then call the getFileInfoDoc from above and then call >> the following 2 lines of code >> >> Analyzer analyzer = getAnalyzer(); >> iwCurrent.UpdateDocument(new Term("FileId", iFileID.ToString(), doc, >> analyzer); >> >> this USUALLY stores the SearchAll field as lower case, but sometimes it >> still fails and writes it as upper case. >> >> >> >> Is there anything that I am missing in terms of making the LowerCaseFilter >> be applied? I don't particularly want to change the text to lower case in >> my code as a 2nd index we use may be having the same issues, but contains >> the contents of the file and changing that to lower case may have a major >> impact on performance. >> >> >> Thanks in advance, >> >> Trevor Watson >> >