Re: modularization discussion

Michael McCandless Sat, 07 May 2011 03:30:55 -0700

I agree: refactoring is TONS of work.  Even cases that seem cut and
dry, from a distance, quickly prove to be hairy (just ask Robert about
refactoring analyzers).


However, I think "unproven gain" is too strong.  EG, just a few days
ago we had a user thread asking how to use auto-suggest outside of
Solr.  Once we commit the suggest module, this is easy/ier for that
user, and now we have one more user testing things, finding bugs,
maybe offering improvements, etc.  I think the gains of each
refactoring are potentially large, but they are not immediate -- they
accrue over time.  It's an investment.

Also: I'm in no way asking/expecting other devs to sign up to do
refactoring (your response seems to imply this).  Nobody can do such a
thing.  We all scratch our own itches and I'm not asking you to
scratch mine :)

What I am asking is that if someone wants to scratch this itch (factor
out XXX as a module), they are fully free to do so, as long as it
doesn't harm Solr's/Lucene's current functions, performance, etc.  We
don't seem to have this freedom today, and this is, I think, the core
conflict.

Grant if I'm reading your response right, you agree with that freedom
(others are free to refactor); you're just tempering in a good dose of
reality ("refactoring is hard"), which I agree with.

Mike

http://blog.mikemccandless.com

On Thu, May 5, 2011 at 10:25 AM, Grant Ingersoll <gsing...@apache.org> wrote:
>
> On May 5, 2011, at 4:15 AM, Simon Willnauer wrote:
>
>> Hey folks
>>
>> On Tue, May 3, 2011 at 6:49 PM, Michael McCandless
>> <luc...@mikemccandless.com> wrote:
>>> Isn't our end goal here a bunch of well factored search modules?  Ie,
>>> fast forward a year or two and I think we should have modules like
>>> these:
>>
>> I think we have two camps here (10k feet view):
>>
>
> I'd say 3 camps:
>
>> 1. wants to move towards modularization might support all the modules
>> mike has listed below
>> 2. wants to stick with Solr's current architecture and remain
>> "monolithic" (not negative in this case) as much as possible
>
> 3.  Those who think most should be modularized, but realize it's a ton of 
> work for an unproven gain (although most admit it is a highly likely gain) 
> and should be handled on a case-by-case basis as people do the work.   I 
> don't have anything against modularization, I just know, given my schedule, I 
> won't be able to block off weeks of time to do it.  I'm happy to review 
> where/when I can.
>
>
>>
>> I think we can meet somewhere in between and agree on certain module
>> that should be available to lucene users as well. The ones I have in
>> mind are
>> primary search features like:
>> - Faceting
>
> Yeah, for instance, Bobo seems to have some interesting faceting 
> implementations that are ASL, perhaps we can combine into this new faceting 
> module.
>
>> - Highlighting
>> - Suggest
>> - Function Query (consolidation is needed here!)
>> - Analyzer factories
>
> +1.
>
>>
>> things like distribution and replication should remain in solr IMO but
>> might be moved to a more extensible API so that people can add their
>> own implementation.
>
> And, of course, all the web tier stuff (response writers, inputs, etc.)
>
>> I am thinking about things like the ZooKeeper
>> support that might not be a good solution for everybody where folks
>> have already JGroups infrastructure.
>
> Or other similar solutions.  I wonder about using a ZeroConf implementation 
> that can do self-discovery.
>
>> So I think we can work towards 2
>> distinct goals.
>> 1. extract common search features into modules
>> 2. refactor solr to be more "elastic" / "distributed"  and extensible
>> with respect to those goals.
>
> 3. Make it easier for Solr to be programmatically configured by decoupling 
> the reading of schema.xml and solrconfig.xml from the code that actually 
> contains the structures for the properties (IndexSchema and SolrConfig)
>
>>
>> maybe we can get agreement on such a basis though.
>>
>> let me know what you think
>
> I think it's reasonable.  At the end of the day, it broadens the appeal of 
> both Lucene and Solr.  Solr still exists and is not just a "shell" and at the 
> end of the day, remains the primary choice for people who don't want to 
> stitch everything together themselves.  All of it is easier to contribute to 
> b/c people can focus in on the core area they know w/o having to know 
> everything else per se.  Stuff should be better tested b/c of it as well 
> since it will receive broader use.
>
> That being said, and not to be discouraging, but I see it as a ton of work.
>
>
>
>
>>
>> simon
>>>
>>>  * Faceting
>>>
>>>  * Highlighting
>>>
>>>  * Suggest (good patch is on LUCENE-2995)
>>>
>>>  * Schema
>>>
>>>  * Query impls
>>>
>>>  * Query parsers
>>>
>>>  * Analyzers (good progress here already, thanks Robert!),
>>>    incl. factories/XML configuration (still need this)
>>>
>>>  * Database import (DIH)
>>>
>>>  * Web app
>>>
>>>  * Distribution/replication
>>>
>>>  * Doc set representations
>>>
>>>  * Collapse/grouping
>>>
>>>  * Caches
>>>
>>>  * Similarity/scoring impls (BM25, etc.)
>>>
>>>  * Codecs
>>>
>>>  * Joins
>>>
>>>  * Lucene core
>>>
>>> In this future, much of this code came from what is now Solr and
>>> Lucene, but we should freely and aggressively poach from other
>>> projects when appropriate (and license/provenance is OK).
>>>
>>> I keep seeing all these cool "compressed int set" projects popping
>>> up... surely these are useful for us.  Solr poached a doc set impl
>>> from Nutch; probably there's other stuff to poach from Nutch, Mahout,
>>> etc.
>>>
>>> Katta's doing something sweet with distribution/replication; let's
>>> poach & merge w/ Solr's approach.  There are various facet impls out
>>> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
>>> with Solr's.
>>>
>>> Elastic Search has lots of cool stuff, too, under ASL2.
>>>
>>> All these external open-source projects are fair game for poaching and
>>> refactoring into shared modules, along with what is now Solr and
>>> Lucene sources.
>>>
>>> In this ideal future, Solr becomes the bundling and default/example
>>> configuration of the Web App and other modules, much like how the
>>> various Linux distros bundle different stuff together around the Linux
>>> kernel.  And if you are an advanced app and don't need the webapp
>>> part, you can cherry pick the huper duper modules you do need and
>>> directly embedded into your app.
>>>
>>> Isn't this the future we are working towards?
>>>
>>> Mike
>>>
>>> http://blog.mikemccandless.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> --------------------------
> Grant Ingersoll
> Lucene Revolution -- Lucene and Solr User Conference
> May 25-26 in San Francisco
> www.lucenerevolution.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

Reply via email to