: Then during build we can package up certain combinations. I think : there should be sub-kitchen-sink jars by area, eg a jar that contains : all analyzers/tokenstreams/filters, all queries/filters, etc.
Or just make it trivial to get all jars that fit a given profile w/o actually merging those jars into an uber-jar ... does maven's dependency management have any like "bundles" or "virtual packages" so we could publish a "lucene-all-analzers" POM that didn't have an actual lucene-all-analyzers.jar but listed dependencies on all of the individual jars? (FYI: Perl's CPAN has the concept of a "Bundle" that's just an empty distribution that depends on other distributions so you have an single refrence point for installing them) : So, how would you refactor the various sources of : analyzers/tokenstream/tokenfilters we have today : (src/java/org/apache/lucene/analysis/*, contrib/snowball/*, : contrib/collation/* and contrib/analyzers/*)? (Even contrib/memory : has a neat PatternAnalyzer, that operates on a string using a regexp : to get tokenns out, that only now am I just discovering). I think ideally the existig contrib/analysis would be broken up by language -- even if that means only 2 or 3 classes per jar -- but i don't deal with multilingual stuff much so i don't have much of an opinoin ... perhaps the majority of our users that deal with non-english tend to deal with *lots* of langauges so having a single "multilingual-analysis" module would be suitable. : We also need to think about how this impacts our back-compat policy. : EG when are we allowed to split up modules into sub-modules, or merge : them. spliting a module should always be fair game as long as the new module(s) maintain the same back compat policy ... it's not a burden to ask people to start using 2 jars instead of 1 jar (especially if we're already going to have an easy way to bundle jars up into uber-jars) in theory merging modules should require that the new module adopt the most restrictive back-compat policy of the previous modules. : Assuming there's general consensus on this "break core into modules" : approach, I think the next step is to take in inventory of all of : Lucene's classes and roughly divide them into proposed modules, and : iterate on that? Hoss do you want to take a first stab at that? Heh. i'm not sure i could even answer the "want" question in the afirmative. This is essentially a question of refactoring, and I think approaching this incrimentally would be the best strategy ... either by first finding some low hanging fruit in core that could be extracted int oa contrib easily (spans, query parser) or by restructuring the build system to put contribs and the demo on equal footing with core as "modules" and reasses as progress is made. on a personal note: even if i wanted to lead this charge, i really can't right now ... folks may have noticed my involvement with lucene has been markedly lower in the last few months, i expect it to get even lower over the next 2 months before it will (hopefully) get higher. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org