After stiring things up, and then being off-list for ~10 days, I'm in an interesting position coming back to this thread and seeing the discussion *after* it essentially ended, with a lot of semi-concensus but no clear sense of hard and fast resolution or plan of action.
FWIW, here are the notes i made based on reading the thread about the various sentiments i noticed expressed (wether i agree with them or not) in order to try and get a handle on what had been discussed. some of these were the optinion of a single person and i've paraphrased, others are my generalization of similar comments made by various people... - contrib has a bad rap - widely varying degrees of quality/stability in contrib code, hard to get people to rely on the "good" ones because of the "less good" ones - many people want a good, out of hte box, kitchen sink experience (ie: one monolithic jar containing all the "essentials") - need easy discoverability of all things of a given type (ie: all queries, all filters, all analyzers, etc...) .. ie: combined javadocs. - need easy installation of of all things of a given type (ie: a jar containing all types of queries, a jar containing all types of analyzers, etc...) - still need to deal with contribs that have external dependencies - still need to deal with contribs that require future versions of langauge (Java1.7 when core is still 1.5 compat) - users need better guidance about "why" something is a contrib (additional functionality, alternate functionality, example of use, tool, etc...) - while we should maintain/increase modularization, documentation should make features of contribs more promonent without stressing the isolation resulting from code modularization. - we should merge all contrib & core code into a unified src/ tree, and make the pacakging independent of the physical location in svn (ie: jars based on java package, not directory) While I'm mostly in favor of all of these sentiments, and think it's really just a question of how to go about it, the last one is actually something i've pretty stronly opposed to -- I think the best way forward is to have lots of small, well isolated source trees. code isolation (by directory hierarchy) is hte best way i've seen to ensure modularization, and protect against inadvertent dependency bleeding. If we want to be able to produce small jars targeted at specific goals, and we want o.l.a.foo.FooClass to be in foo.jar and o.l.a.bar.BarClass to be in bar.jar then we shouldn't have src/java/o/l/a/foo/FooClass.java and src/java/o/l/a/bar/BarClass.java -- doing so makes it way to easy for inadvertnent dependencies to crop up that make FooClass depend on bar class, and thus make it impossible to use foo.jar without also using bar.jar at runtime. it's certainly possible to have "all" source code in a single directory hierarchy, and then rely on the build system to ensure your don't inwarranted dependencies, but that requires you do express rules in the build system about what exactly the acceptible dependencies are, and it relies on everyone using the buildsystem correctly (missguided users of hand-holding IDEs could get very frustrated when the patches they submit violate rules of an overly complicated set of ant build files) FWIW: having lots/more of very small, isolated, hierarcies also wouldn't hinder any attempts at having kitchen-sink or "essential" jars -- combining the classes from lots of little isolated code trees is a lot easier then extracting a few classes from one big code tree. One underlying assumption that seems to have permiated the existing discussion (without ever being explicitly stated) is the idea that most currently lives in src/java is the "core" and would be a single "module" ... personally i'd like to challege that assumption. I'd like to suggest that besides obvious things that could be refactored out into other "modules" (span queries, queryparser) there are lots of additional ways that src/java could be sliced... - interfaces and abstract clases and concrete classes for reading an index in one index-api.jar (ie: Directory but no FSDirectory; IndexReader but not MultiReader) - ditto for creating/updating an index in one index-update.jar (ie: IndexWriter, TokenStream, Tokenizer, TokenFilter, Analyzer but not any impls of the last 3) - ditto for searching in index-search.jar (ie: Searcher, Searchable, HitCollector, Query ... but not any concrete subclasses - simple-analysis.jar (SimpleAnalyzer, WhitespaceAnalyzer, LetterTokenizer, LowercaseFilter, etc...) - english-analysis.jar (StandardAnalyzer, etc...) - primative-queries.jar (TermQuery, BooleanQuery, MatchAllDocsQuery, MultiTermQuery, etc...) - range-queries.jar (RangeQuery, RangeFilter, ConstantScoreRangeQuery) ...etc... The crux of my point being that what we think of today as the lucene "core" is actually kind of big and bloated, and already has *a* kitchen sink thrown in -- it's just not neccessarily the kitchen sink many people want. a big percentage of our users may want highlighting by default, and may never care about function or span queries -- making it easier to get a monolithic jar of *everything* only addresses one of those three disconnects (easy access to the highlighting code) but splitting the current "core" up into lots of little pieces (aka: "modules") that have equal visibility to the existing contribs (now also "modules") would address all three disconnects: people wouldn't overlook modules they might want (like highlighting) because they are just as easy to find the "core" and people wouldn't wind up with bloated jars containing a lot of code they don't need. (beating a dead horse for a moment: this wouldn't proclude us from offering a bloated jar containing everything under the sun) Even without making radical changes to the way our source code is organized, a lot of improvements could be made by having better documentation ... http://lucene.apache.org/java/2_4_1/ could certainly have more info about what is included in a release, what types of things can be found in a contrib, etc... Individual contrib README files should certianly get beefed up to describe their purpose, their level of maturity, and their back compat commitments. The demo and getting started guies could also be expanded to refrence the contrib jars that contain code many people may want to reuse... ...and that's all small improvements that could be made without radically changing anything about our source organization or packaging. splitting the core up into smaller modules would only help the situation, moving more things into the core seem like it would just make the problem worse. : I agree, but at least we need some clear criteria so the future : decision process is more straightforward. Towards that... it seems : like there are good reasons why something should be put into contrib: I would agrue that is approaching the problem from the wrong direction. assume for the moment that we define the list of lucene "modules" as: ls -d contrib/* src/java src/gcj src/demo src/jsp ...but in the future we want to split up some of hte bigger "modules" and move each module so they have equal visibility. i would suggest that the opperating assumption be that any new code contribution that adds functionality (ie: not a bug fix, or an enhancement to an existing Impl) belongs in a new "module" unless: 1) compilation constraints require that it be put in an existing module (ie: needs to introduce a bi-directional dependency with an existing class which can't be refactored out into the new module) 2) it is a natural conceptual fit with *all* of the existing classes in that module (ie: a new ThaiStemmerFilter could be added to an existing thai-analysis module) (but an equally important to the question of "when to add to an existing 'module' vs creating a new module?" should be the question of "when to split an exsting module?" ... something we've never really talked about for core or contribs.) : But I don't think "it doesn't have to be in core" (the "software : modularity" goal) is the right reason to put something in contrib. Would it sound like a better reason if we stoped calling "core" ... i look at it from the point of view of: Are classes A,B&C (which are tightly coupled) directly related to classes X,Y&Z (also tightly coupled) ?" ... if the answer is "no" then A,B&C do not belong in the same module as X,Y&Z ... it doesn't matter which module we're talking about (src/java, contrib/highlighter etc...) i don't think it makes any sense for the the TreiRangeQueries to be in the same "module" as IndexWriter, or IndexReader ... but i also don't think it makes sense for the trie to be in the same module as BoostingQuery or DuplicateFilter -- or for IndexWRiter to be in the same module as the existing query parser (or for hte existing query parser to be in the same module as the new one the IBM folks have been working on) we can have fine grained modularity w/o having second class citizens, and we can achieve it without needing to make radical changes -- but putting more stuff into "core" isn't going to help us get there. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org