Re: The Future of Nutch, reactivated

2009-05-23 Thread Julien Nioche
Hi, Am joining the conversation a bit late but nevermind... In my views the main targets should be (2). As you pointed out, SOLR covers (3) and (4) quite well (or will progressively do so). As for (1), there is definitely an audience even if it is small but would certainly benefit from the work

Re: The Future of Nutch, reactivated

2009-05-15 Thread Raymond Balmès
I 'm still a new user so although I found it rather easy to get going and build my own plugin's I have some suggestions. Yes one thing that I'd like to see is a kind of way to estimate how long will a certain step (fetch, ...) will take... something like a progress bar. Because you launch a step

Re: The Future of Nutch, reactivated

2009-05-15 Thread consultas
Keep it simple. Many people, it seems to me, use nutch to exercise, in some way their programming expertise and talents. I am just a user, and I think that users just want something thant can index the web and find results, when they search. I don't want to deal with complicated application

Re: The Future of Nutch, reactivated

2009-05-14 Thread AJ Chen
Andrzej, great summary. I played with nutch before for web search engine, but has not used it for a while because it has become too complicated. based on my experience in building semantic search engine for healthcare vertical, it think it would be benefitial to separate crawling from search

Re: The Future of Nutch, reactivated

2009-05-14 Thread Mattmann, Chris A
Hi Andrzej, Great summary. My general feeling on this is similar to my prior comments on similar threads from Otis and from Dennis. My personal pet projects for Nutch2: * refactored Nutch core data structures, modeled as POJOs * refactored Nutch architecture where

Re: The Future of Nutch

2009-04-02 Thread Thorsten Scherler
On Wed, 2009-04-01 at 07:42 -0700, Ken Krugler wrote: ... I would suggest looking at Katta (http://katta.sourceforge.net/). It's one of several projects where the goal is to support very large Lucene indexes via distributed shards. Solr has also added federated search support. Interesting.

Re: The Future of Nutch

2009-04-02 Thread Doğacan Güney
On Wed, Apr 1, 2009 at 17:42, Ken Krugler kkrugler_li...@transpac.comwrote: On Fri, 2009-03-13 at 19:42 -0700, buddha1021 wrote: hi dennis: ... I am confident that hadoop can process the large datas of the www search engine! But lucene? I am afraid of the limited size of lucene's

Re: The Future of Nutch

2009-03-31 Thread Thorsten Scherler
On Fri, 2009-03-13 at 19:42 -0700, buddha1021 wrote: hi dennis: ... I am confident that hadoop can process the large datas of the www search engine! But lucene? I am afraid of the limited size of lucene's index per server is very little ,10G? or 30G? this is not enough for the www search

Re: The Future of Nutch

2009-03-31 Thread Thorsten Scherler
On Fri, 2009-03-20 at 11:55 +0200, Doğacan Güney wrote: Hi, On Sat, Mar 14, 2009 at 02:19, Dennis Kubes ku...@apache.org wrote: ... Since there are different purposes for different users, would it be good to consider moving Nutch to a top level apache project out from under the Lucene

Re: The Future of Nutch

2009-03-27 Thread Bradford Stephens
Hey there, Just chiming in that we use the complete Nutch + Hadoop + Lucene stack -- we download pages, index them for keywords, and then do heavy Semantic Parsing on it to produce BI data. We also use a lot of plug-ins for parsing and ranking information. What we don't use is the 'built-in GUI

Re: The Future of Nutch

2009-03-20 Thread Mattmann, Chris A
Guys, I thought I'd chime in here. I don't have a lot of time tonight (long day out here in California), but perhaps I can add more thoughts tomorrow. My +1 for moving Nutch into a TLP. With a 1.0 release, and several prior releases (~10), I think that the discussion is reasonable. I also tend

Re: The Future of Nutch

2009-03-20 Thread Doğacan Güney
Hi, On Sat, Mar 14, 2009 at 02:19, Dennis Kubes ku...@apache.org wrote: With the release of Nutch 1.0 I think it is a good time to begin a discussion about the future of Nutch.  Here are some things to consider and would love to here everyones views on this Nutch's original intention was as

Re: The Future of Nutch

2009-03-18 Thread Alex Basa
I actually use Nutch as a large scale search engine on two products. I think a few things that would be nice to have are built in options to produce an incremental index and maybe a quartz scheduler to automate it completely. One thing that would be nice is when one of us figures something

Re: The Future of Nutch

2009-03-17 Thread Marc Boucher
Dennis, Otis et al, My very small team has kept silent for a long time. We've been playing with Nutch, Hadoop and to a lesser extent Solr for about 2 years now. Before I get into my thoughts on what direction things should take I would like to offer a thought on why Nutch is not as active as

Re: The Future of Nutch

2009-03-17 Thread Dennis Kubes
Marc, Glad you responded. Always good to hear peoples thoughts. Marc Boucher wrote: Dennis, Otis et al, My very small team has kept silent for a long time. We've been playing with Nutch, Hadoop and to a lesser extent Solr for about 2 years now. Before I get into my thoughts on what direction

Re: The Future of Nutch

2009-03-17 Thread Marc Boucher
Dennis, That adds another dimension to the issue which I had not considered. One avenue as you suggest would be to add another committer to the Lucene PMC. If that does not work them maybe going the route of TLP is the best option. Marc Part of this is about releases.  Currently releases are

Re: The Future of Nutch

2009-03-16 Thread Otis Gospodnetic
Hello, Comments inlined. - Original Message From: Dennis Kubes ku...@apache.org To: nutch-user@lucene.apache.org Sent: Friday, March 13, 2009 8:19:37 PM With the release of Nutch 1.0 I think it is a good time to begin a discussion about the future of Nutch. Here are some

Re: The Future of Nutch

2009-03-16 Thread Tony Wang
I just wish there could be some clear documentation for Nutch/Solr integration publicly available. Or some developers are already working on this? - Tony On Mon, Mar 16, 2009 at 6:50 PM, Otis Gospodnetic ogjunk-nu...@yahoo.comwrote: Hello, Comments inlined. - Original Message

Re: The Future of Nutch

2009-03-14 Thread yanky young
Hi: I also agree that the most usage scenarios of nutch are in vertical search area. and in some unusual case users may don't even use nutch indexing at all. they just crawl some pages as mirror purpose. and in some cases of vertical search, user only need a fraction of pages, e.g. house rent

Re: The Future of Nutch

2009-03-14 Thread consultas
I am using Nutch for more than four years now, as a vertical search engine, having indexed, some times, over one million pages. On the other hand, I dont know nothing about programming and some specialized aplications. Words like solr and others are like aliens for me. I am just interested

Re: The Future of Nutch

2009-03-14 Thread John Martyniak
I think that this would be the case for making Nutch a top level Apache Project. So that you can publish the framework and a complete app but still tie it all together. Because personally I think that is the strength of Nutch, that you can use it right out of the box, without

Re: The Future of Nutch

2009-03-13 Thread John Martyniak
Dennis, I am with you, I am building a large scale www search engine. But might also build a vertical search as well. Aren't the requirements the same for building a large scale www search, against building a vertical www search, the only thing that seems to change is the scope. I like