On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey <mar...@rectangular.com> wrote: > >> when working on 3.1 if we make some great improvement, I'd like new users in >> 3.1 to see the improvement by default. > > Sounds like an argument for more frequent major releases.
Yeah. Or "rebranding" what we now call minor as major releases, by changing our policy ;) Or "rebranding" to Lucene 2009. But: localized improvements (like the sizable performance gain from turning off scoring when sorting by field) should not have to wait for a major release to benefit new users. I think they should be on by default on the next release. Will Lucy do scoring when sorting by field, by default? >> On thinking about it more... automagically storing the "actsAsVersion" >> in the index, and then having IndexWriter (for example) ask the >> analyzer for a tokenStream matching that version, seems a little too >> sneaky. > > Can you elaborate? > > In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter > have to be passed a Schema, which contains all the Analyzers. Analyzers > aren't satellite classes under this model -- they are a fixed property of a > FullTextType field spec. Think of them as baked into an SQL field definition. > > You can create a Schema from scratch to pass to the QueryParser, but it's > easier to just get it from the Searcher. Translating to Java... > > Searcher searcher = new Searcher("/path/to/index"); > QueryParser qparser = new QueryParser(searcher.getSchema()); > > I don't see how that's so different from getting an analyzer actsAsVersion > number from the index. I agree in KS/Lucy, it works well, because you must explicitly pass in Schema to each of the satellite classes. But in Lucene, if whenever IndexWriter asked analyzer for a tokenstream, it passed in the actsAsVersion it had loaded from the index, that's sneaky. I'd rather have it explicit (like KS/Lucy), so you'd have to IndexWrter.getActsAsVersion, then pass that into your analyzer when you create it. It's the automatic under-the-hood passing that makes me nervous and I think would confuse users. (That said, unrelated to this discussion, I would actually like to record per-segment which version of Lucene wrote the segment; this would be very helpful when debugging issues like LUCENE-1474 where I need to know if the segments were written by 2.4.0 or 2.4.1). > Now, where stuff might start to get complicated is PerFieldAnalyzerWrapper... > is that where the sneakiness gets overwhelming? Per-class actsAsVersion would work well here -- PFAW would just forward the required version when requesting the tokenStream. >> I prefer the up-front "you specify actsAsVersion" when you >> create the analyzer, only for analyzers that have changed across >> releases. So things like WhitespaceAnalyzer would likely never need >> an actsAsVersion arg. > > Hmm, this is kind of hard. I'd prefer that the argument remain optional, so > that new users don't have to think about it. I wouldn't mind optional, but only if it defaults to latest and greatest. The goal here is to have new users always see the best of Lucene when they start out. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org