Hi, This option is a safety thing in the case you cannot trust your input data. Maybe you suddenly tokenize a binary file and produce millions of random tokens. In that case only maybe 10000 are generated. If you input data is trusted and text-based (e.g. read from elements in XML files, databases,...), then you don't need this filter.
> Maybe I am too far behind the times. I was updating some pretty old stuff. > I think it was written originally with Lucene 1.4. I seem to recall that Lucene > v1.x had analyzers where the default was "limited", because I learned pretty > early that I had to set that option during indexing. Perhaps at some point the The limiting option was almost always on IndexWriter, but it defaulted to 10000 tokens from the beginning. The analyzers had nothing to do with this option. The recent change removed the token counting from IndexWriter (as it only makes the already complicated code more unreadable) and was moved to a simple TokenFilter because it's much more reasonable to do it during analysis. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Joe MA [mailto:mrj...@comcast.net] > Sent: Thursday, December 01, 2011 9:24 AM > To: general@lucene.apache.org > Subject: RE: MaxFieldLength in Lucene 3.4 > > > > "of course all other analyzers are unlimited" > > Maybe I am too far behind the times. I was updating some pretty old stuff. > I think it was written originally with Lucene 1.4. I seem to recall that Lucene > v1.x had analyzers where the default was "limited", because I learned pretty > early that I had to set that option during indexing. Perhaps at some point the > switch was made to default unlimited. Thanks your answer clears it up. > > One question - why even have this option now? Are things more efficient with a > limited token field? If you know your data is 'bounded', should you always limit > the token field to improve performance? > > Thanks! > > > -----Original Message----- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Monday, November 28, 2011 2:41 AM > To: general@lucene.apache.org > Subject: RE: MaxFieldLength in Lucene 3.4 > > Hi, > > The move is simple - LimitTokenCountAnalyzer is just a wrapper around any > other Analyzer, so I don't really understand your question - of course all other > analyzers are unlimited. If you have myAnalyzer with myMaxFieldLengthValue > used before, you can change your code as follows: > > Before: > new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34, > myAnalyzer).setFoo().setBar().setMaxFieldLength(myMaxFieldLengthValue)); > > After: > new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34, new > LimitTokenCountAnalyzer(myAnalyzer, > myMaxFieldLengthValue)).setFoo().setBar()); > > You only have to do this on the indexing side, on the query side > (QueryParser) just use myAnalyzer without wrapping. With the new code, the > responsibilities for cutting the field after a specific number of tokens was > moved out out the indexing code in Lucene. This is now just an analysis feature > not a indexing feature anymore. > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Joe MA [mailto:mrj...@comcast.net] > > Sent: Monday, November 28, 2011 8:09 AM > > To: general@lucene.apache.org > > Subject: MaxFieldLength in Lucene 3.4 > > > > While upgrading to Lucene 3.4, I noticed the MaxFieldLength values on the > > indexers are deprecated. There appears to be a LimitTokenCountAnalyzer > > that limits the tokens - so does that mean the default for all other > analyzers is > > unlimited? > > > > Thanks in advance - > > JM