I agree: refactoring is TONS of work. Even cases that seem cut and dry, from a distance, quickly prove to be hairy (just ask Robert about refactoring analyzers).
However, I think "unproven gain" is too strong. EG, just a few days ago we had a user thread asking how to use auto-suggest outside of Solr. Once we commit the suggest module, this is easy/ier for that user, and now we have one more user testing things, finding bugs, maybe offering improvements, etc. I think the gains of each refactoring are potentially large, but they are not immediate -- they accrue over time. It's an investment. Also: I'm in no way asking/expecting other devs to sign up to do refactoring (your response seems to imply this). Nobody can do such a thing. We all scratch our own itches and I'm not asking you to scratch mine :) What I am asking is that if someone wants to scratch this itch (factor out XXX as a module), they are fully free to do so, as long as it doesn't harm Solr's/Lucene's current functions, performance, etc. We don't seem to have this freedom today, and this is, I think, the core conflict. Grant if I'm reading your response right, you agree with that freedom (others are free to refactor); you're just tempering in a good dose of reality ("refactoring is hard"), which I agree with. Mike http://blog.mikemccandless.com On Thu, May 5, 2011 at 10:25 AM, Grant Ingersoll <gsing...@apache.org> wrote: > > On May 5, 2011, at 4:15 AM, Simon Willnauer wrote: > >> Hey folks >> >> On Tue, May 3, 2011 at 6:49 PM, Michael McCandless >> <luc...@mikemccandless.com> wrote: >>> Isn't our end goal here a bunch of well factored search modules? Ie, >>> fast forward a year or two and I think we should have modules like >>> these: >> >> I think we have two camps here (10k feet view): >> > > I'd say 3 camps: > >> 1. wants to move towards modularization might support all the modules >> mike has listed below >> 2. wants to stick with Solr's current architecture and remain >> "monolithic" (not negative in this case) as much as possible > > 3. Those who think most should be modularized, but realize it's a ton of > work for an unproven gain (although most admit it is a highly likely gain) > and should be handled on a case-by-case basis as people do the work. I > don't have anything against modularization, I just know, given my schedule, I > won't be able to block off weeks of time to do it. I'm happy to review > where/when I can. > > >> >> I think we can meet somewhere in between and agree on certain module >> that should be available to lucene users as well. The ones I have in >> mind are >> primary search features like: >> - Faceting > > Yeah, for instance, Bobo seems to have some interesting faceting > implementations that are ASL, perhaps we can combine into this new faceting > module. > >> - Highlighting >> - Suggest >> - Function Query (consolidation is needed here!) >> - Analyzer factories > > +1. > >> >> things like distribution and replication should remain in solr IMO but >> might be moved to a more extensible API so that people can add their >> own implementation. > > And, of course, all the web tier stuff (response writers, inputs, etc.) > >> I am thinking about things like the ZooKeeper >> support that might not be a good solution for everybody where folks >> have already JGroups infrastructure. > > Or other similar solutions. I wonder about using a ZeroConf implementation > that can do self-discovery. > >> So I think we can work towards 2 >> distinct goals. >> 1. extract common search features into modules >> 2. refactor solr to be more "elastic" / "distributed" and extensible >> with respect to those goals. > > 3. Make it easier for Solr to be programmatically configured by decoupling > the reading of schema.xml and solrconfig.xml from the code that actually > contains the structures for the properties (IndexSchema and SolrConfig) > >> >> maybe we can get agreement on such a basis though. >> >> let me know what you think > > I think it's reasonable. At the end of the day, it broadens the appeal of > both Lucene and Solr. Solr still exists and is not just a "shell" and at the > end of the day, remains the primary choice for people who don't want to > stitch everything together themselves. All of it is easier to contribute to > b/c people can focus in on the core area they know w/o having to know > everything else per se. Stuff should be better tested b/c of it as well > since it will receive broader use. > > That being said, and not to be discouraging, but I see it as a ton of work. > > > > >> >> simon >>> >>> * Faceting >>> >>> * Highlighting >>> >>> * Suggest (good patch is on LUCENE-2995) >>> >>> * Schema >>> >>> * Query impls >>> >>> * Query parsers >>> >>> * Analyzers (good progress here already, thanks Robert!), >>> incl. factories/XML configuration (still need this) >>> >>> * Database import (DIH) >>> >>> * Web app >>> >>> * Distribution/replication >>> >>> * Doc set representations >>> >>> * Collapse/grouping >>> >>> * Caches >>> >>> * Similarity/scoring impls (BM25, etc.) >>> >>> * Codecs >>> >>> * Joins >>> >>> * Lucene core >>> >>> In this future, much of this code came from what is now Solr and >>> Lucene, but we should freely and aggressively poach from other >>> projects when appropriate (and license/provenance is OK). >>> >>> I keep seeing all these cool "compressed int set" projects popping >>> up... surely these are useful for us. Solr poached a doc set impl >>> from Nutch; probably there's other stuff to poach from Nutch, Mahout, >>> etc. >>> >>> Katta's doing something sweet with distribution/replication; let's >>> poach & merge w/ Solr's approach. There are various facet impls out >>> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge >>> with Solr's. >>> >>> Elastic Search has lots of cool stuff, too, under ASL2. >>> >>> All these external open-source projects are fair game for poaching and >>> refactoring into shared modules, along with what is now Solr and >>> Lucene sources. >>> >>> In this ideal future, Solr becomes the bundling and default/example >>> configuration of the Web App and other modules, much like how the >>> various Linux distros bundle different stuff together around the Linux >>> kernel. And if you are an advanced app and don't need the webapp >>> part, you can cherry pick the huper duper modules you do need and >>> directly embedded into your app. >>> >>> Isn't this the future we are working towards? >>> >>> Mike >>> >>> http://blog.mikemccandless.com >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > -------------------------- > Grant Ingersoll > Lucene Revolution -- Lucene and Solr User Conference > May 25-26 in San Francisco > www.lucenerevolution.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org