On Fri, 2009-03-20 at 11:55 +0200, Doğacan Güney wrote:
> Hi,
> 
> On Sat, Mar 14, 2009 at 02:19, Dennis Kubes <ku...@apache.org> wrote:
...
> > Since there are different purposes for different users, would it be good to
> > consider moving Nutch to a top level apache project out from under the
> > Lucene umbrella?  This would then allow the creation of nutch sub-projects,
> > such as nutch-solr, nutch-hbase.  Thoughts?
> >
> > Many parts of Nutch have also been implemented in other projects.  For
> > example, Tika for the parsers, Droids for the Crawler.  In begs the question
> > what is Nutch's core features going forward.  When I think about search
> > (again my perspective is large scale), I think crawling or acquisition of
> > data, parsing, analysis, indexing, deployment, and searching.  I personally
> > think that there is much room for improvement in crawling and especially
> > analysis.  Nutch shouldn't just be about the shell but also the brains.
> >
> 
...
> So I think delegating nutch functionality to other projects
> (tika/droids/solr/etc)
> is a great idea (so nutch can focus on "the brains" as Dennis said), but
> I don't like the idea of separating nutch into pieces.

I hoped to meet some nutch people at apacheCon to talk about this mail. 

Droids is ATM incubating with 2 sponsor projects HC and lucene. With
nutch becoming TLP droids would be much more a nutch subproject then one
of the before mentioned. 

I see the essence of this thread and the current reality of moving
functionality away from nutch. Tika is the attempt to use the parser
functionality outside from nutch. Droids uses tika for parsing and even
so I would welcome that tika splits in different parser parts to reduce
dependencies in droids. 

Hearing that droids could become nutch standard fetcher/crawler is
really exciting. I invite everyone to join droids mailing list to make
this happening. Droids is similar to tika an attempt to use crawling
facility outside of nutch.

Nutch core competence is:
- indexing
- searching

where the focus is on:
- make it happen in the cloud
- on a BIG scale
- with millions of slaves

salu2
-- 
Thorsten Scherler <thorsten.at.apache.org>
Open Source <consulting, training and solutions>

Reply via email to