> maturity, and their back compat commitments. The demo and getting > started guies could also be expanded to refrence the contrib jars that > contain code many people may want to reuse...
Here's an idea. Each contrib is really a project onto its own. And any project, I suggest, ought to have its own demo program, together maybe with a small write-up describing the idea behind the contrib and what the demo does. So to get the ball rolling, how about adopting some such documentation policy for *future* contribs as a pseudo-requirement for making it into the official release? Cheers, -Babak PS this not a swipe at any upcoming contrib (TrieUtils: the documentation there is really good :) On Mon, Mar 30, 2009 at 5:31 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > After stiring things up, and then being off-list for ~10 days, I'm in an > interesting position coming back to this thread and seeing the discussion > *after* it essentially ended, with a lot of semi-concensus but no clear > sense of hard and fast resolution or plan of action. > > FWIW, here are the notes i made based on reading the thread about the > various sentiments i noticed expressed (wether i agree with them or > not) in order to try and get a handle on what had been discussed. > some of these were the optinion of a single person and i've paraphrased, > others are my generalization of similar comments made by various > people... > > - contrib has a bad rap > - widely varying degrees of quality/stability in contrib code, hard to get > people to rely on the "good" ones because of the "less good" ones > - many people want a good, out of hte box, kitchen sink experience (ie: > one monolithic jar containing all the "essentials") > - need easy discoverability of all things of a given type (ie: all > queries, all filters, all analyzers, etc...) .. ie: combined javadocs. > - need easy installation of of all things of a given type (ie: a jar > containing all types of queries, a jar containing all types of analyzers, > etc...) > - still need to deal with contribs that have external dependencies > - still need to deal with contribs that require future versions of > langauge (Java1.7 when core is still 1.5 compat) > - users need better guidance about "why" something is a contrib > (additional functionality, alternate functionality, example of use, tool, > etc...) > - while we should maintain/increase modularization, documentation should > make features of contribs more promonent without stressing the isolation > resulting from code modularization. > - we should merge all contrib & core code into a unified src/ tree, and > make the pacakging independent of the physical location in svn (ie: jars > based on java package, not directory) > > While I'm mostly in favor of all of these sentiments, and think it's > really just a question of how to go about it, the last one is actually > something i've pretty stronly opposed to -- I think the best way forward > is to have lots of small, well isolated source trees. > > code isolation (by directory hierarchy) is hte best way i've seen to > ensure modularization, and protect against inadvertent dependency > bleeding. If we want to be able to produce small jars targeted at > specific goals, and we want o.l.a.foo.FooClass to be in foo.jar and > o.l.a.bar.BarClass to be in bar.jar then we shouldn't have > src/java/o/l/a/foo/FooClass.java and src/java/o/l/a/bar/BarClass.java -- > doing so makes it way to easy for inadvertnent dependencies to crop up > that make FooClass depend on bar class, and thus make it impossible to use > foo.jar without also using bar.jar at runtime. > > it's certainly possible to have "all" source code in a single directory > hierarchy, and then rely on the build system to ensure your don't > inwarranted dependencies, but that requires you do express rules in the > build system about what exactly the acceptible dependencies are, and it > relies on everyone using the buildsystem correctly (missguided users of > hand-holding IDEs could get very frustrated when the patches they submit > violate rules of an overly complicated set of ant build files) > > FWIW: having lots/more of very small, isolated, hierarcies also wouldn't > hinder any attempts at having kitchen-sink or "essential" jars -- > combining the classes from lots of little isolated code trees is a lot > easier then extracting a few classes from one big code tree. > > One underlying assumption that seems to have permiated the existing > discussion (without ever being explicitly stated) is the idea that most > currently lives in src/java is the "core" and would be a single "module" > ... personally i'd like to challege that assumption. I'd like to suggest > that besides obvious things that could be refactored out into other > "modules" (span queries, queryparser) there are lots of additional ways > that src/java could be sliced... > > - interfaces and abstract clases and concrete classes for reading an > index in one index-api.jar (ie: Directory but no FSDirectory; IndexReader > but not MultiReader) > - ditto for creating/updating an index in one index-update.jar (ie: > IndexWriter, TokenStream, Tokenizer, TokenFilter, Analyzer but > not any impls of the last 3) > - ditto for searching in index-search.jar (ie: Searcher, Searchable, > HitCollector, Query ... but not any concrete subclasses > - simple-analysis.jar (SimpleAnalyzer, WhitespaceAnalyzer, > LetterTokenizer, LowercaseFilter, etc...) > - english-analysis.jar (StandardAnalyzer, etc...) > - primative-queries.jar (TermQuery, BooleanQuery, MatchAllDocsQuery, > MultiTermQuery, etc...) > - range-queries.jar (RangeQuery, RangeFilter, ConstantScoreRangeQuery) > > ...etc... > > > The crux of my point being that what we think of today as the lucene > "core" is actually kind of big and bloated, and already has *a* kitchen > sink thrown in -- it's just not neccessarily the kitchen sink many people > want. > > a big percentage of our users may want highlighting by default, and may > never care about function or span queries -- making it easier to get a > monolithic jar of *everything* only addresses one of those three > disconnects (easy access to the highlighting code) but splitting the > current "core" up into lots of little pieces (aka: "modules") that have > equal visibility to the existing contribs (now also "modules") would > address all three disconnects: people wouldn't overlook modules they might > want (like highlighting) because they are just as easy to find the "core" > and people wouldn't wind up with bloated jars containing a lot of code > they don't need. (beating a dead horse for a moment: this wouldn't > proclude us from offering a bloated jar containing everything under the > sun) > > Even without making radical changes to the way our source code is > organized, a lot of improvements could be made by having better > documentation ... http://lucene.apache.org/java/2_4_1/ could certainly > have more info about what is included in a release, what types of things > can be found in a contrib, etc... Individual contrib README files should > certianly get beefed up to describe their purpose, their level of > maturity, and their back compat commitments. The demo and getting > started guies could also be expanded to refrence the contrib jars that > contain code many people may want to reuse... > > > ...and that's all small improvements that could be made without > radically changing anything about our source organization or packaging. > splitting the core up into smaller modules would only help the situation, > moving more things into the core seem like it would just make the problem > worse. > > : I agree, but at least we need some clear criteria so the future > : decision process is more straightforward. Towards that... it seems > : like there are good reasons why something should be put into contrib: > > I would agrue that is approaching the problem from the wrong direction. > > assume for the moment that we define the list of lucene "modules" as: > ls -d contrib/* src/java src/gcj src/demo src/jsp > ...but in the future we want to split up some of hte bigger "modules" and > move each module so they have equal visibility. > > i would suggest that the opperating assumption be that any new code > contribution that adds functionality (ie: not a bug fix, or an > enhancement to an existing Impl) belongs in a new "module" unless: > 1) compilation constraints require that it be put in an existing module > (ie: needs to introduce a bi-directional dependency with an existing > class which can't be refactored out into the new module) > 2) it is a natural conceptual fit with *all* of the existing classes in > that module (ie: a new ThaiStemmerFilter could be added to an existing > thai-analysis module) > > (but an equally important to the question of "when to add to an existing > 'module' vs creating a new module?" should be the question of "when to > split an exsting module?" ... something we've never really talked about > for core or contribs.) > > : But I don't think "it doesn't have to be in core" (the "software > : modularity" goal) is the right reason to put something in contrib. > > Would it sound like a better reason if we stoped calling "core" ... i look > at it from the point of view of: Are classes A,B&C (which are tightly > coupled) directly related to classes X,Y&Z (also tightly coupled) ?" > ... if the answer is "no" then A,B&C do not belong in the same module as > X,Y&Z ... it doesn't matter which module we're talking about (src/java, > contrib/highlighter etc...) > > i don't think it makes any sense for the the TreiRangeQueries to be in the > same "module" as IndexWriter, or IndexReader ... but i also don't think it > makes sense for the trie to be in the same module as BoostingQuery or > DuplicateFilter -- or for IndexWRiter to be in the same module as the > existing query parser (or for hte existing query parser to be in the same > module as the new one the IBM folks have been working on) > > > we can have fine grained modularity w/o having second class citizens, and > we can achieve it without needing to make radical changes -- but putting > more stuff into "core" isn't going to help us get there. > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org