On Mon, Mar 30, 2009 at 7:31 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote:
> code isolation (by directory hierarchy) is hte best way i've seen to > ensure modularization, and protect against inadvertent dependency > bleeding. OK I agree this (divorced top-level directories) is a great way to enforce modularity and we should use that. It seems the toplevel directory structure could still have subdirs, eg: analyzers languages th es fr snowball? ... standard collation and: search searcher queries span function And in those "leaf" subdirs above would be the package subdir structure (src/{java,test}/org/apache/lucene/...). Though "svn checkout" and "svn update" and "svn diff" are going to take quite a bit longer with this switch... > One underlying assumption that seems to have permiated the existing > discussion (without ever being explicitly stated) is the idea that > most currently lives in src/java is the "core" and would be a single > "module" ... personally i'd like to challege that assumption. I'd > like to suggest that besides obvious things that could be refactored > out into other "modules" (span queries, queryparser) there are lots > of additional ways that src/java could be sliced... +1: I very much agree what is now called "core" should be refactored as a number of modules. So the general new proposal here seems to be lets break up src/java/* into separate modules (each under its own toplevel directory), just like contrib/* is today. And move Lucene to an "a la carte" model for what we now call core. (what we now call contrib is already "a la carte" today). We would then do away with the top level "core" vs "contrib", and everything would simply be "modules", where each module has metadata/javadocs stating: * JRE version required * What external dependencies (including dependencies to other Lucene modules) are needed * Some measure of "maturity" * Back-compat policy * CHANGES Then during build we can package up certain combinations. I think there should be sub-kitchen-sink jars by area, eg a jar that contains all analyzers/tokenstreams/filters, all queries/filters, etc. This does make the future decision process far easier. Rather than have a capricious and ill-defined "does it go into core vs contrib" question, we now simply decide if it goes into an existing module or makes a new one. > Even without making radical changes to the way our source code is > organized, a lot of improvements could be made by having better > documentation . Agreed. I think this is actually somewhat orthogonal, though should follow more naturally once Lucene is simply a collection of modules. I would think we present "all" and a "per-module" sets of javadocs, plus javadocs aggregated based on how the JARs aggregate? (Ie I could browse the "kitchen-sink" javadocs, the "all analyzers" javadocs, or the "thai analyzers only" javadocs). > (ie: a new ThaiStemmerFilter could be added to an existing > thai-analysis module) So, how would you refactor the various sources of analyzers/tokenstream/tokenfilters we have today (src/java/org/apache/lucene/analysis/*, contrib/snowball/*, contrib/collation/* and contrib/analyzers/*)? (Even contrib/memory has a neat PatternAnalyzer, that operates on a string using a regexp to get tokenns out, that only now am I just discovering). We also need to think about how this impacts our back-compat policy. EG when are we allowed to split up modules into sub-modules, or merge them. Assuming there's general consensus on this "break core into modules" approach, I think the next step is to take in inventory of all of Lucene's classes and roughly divide them into proposed modules, and iterate on that? Hoss do you want to take a first stab at that? Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org