Hi,
Am joining the conversation a bit late but nevermind...
In my views the main targets should be (2). As you pointed out, SOLR covers
(3) and (4) quite well (or will progressively do so). As for (1), there is
definitely an audience even if it is small but would certainly benefit from
the work
I 'm still a new user so although I found it rather easy to get going and
build my own plugin's I have some suggestions.
Yes one thing that I'd like to see is a kind of way to estimate how long
will a certain step (fetch, ...) will take... something like a progress
bar. Because you launch a step
Keep it simple.
Many people, it seems to me, use nutch to exercise, in some way their
programming expertise and talents.
I am just a user, and I think that users just want something thant can index
the web and find results, when they search. I don't want to deal with
complicated application
Andrzej, great summary. I played with nutch before for web search engine,
but has not used it for a while because it has become too complicated. based
on my experience in building semantic search engine for healthcare vertical,
it think it would be benefitial to separate crawling from search
Hi Andrzej,
Great summary. My general feeling on this is similar to my prior comments on
similar threads from Otis and from Dennis. My personal pet projects for
Nutch2:
* refactored Nutch core data structures, modeled as POJOs
* refactored Nutch architecture where
On Wed, 2009-04-01 at 07:42 -0700, Ken Krugler wrote:
...
I would suggest looking at Katta (http://katta.sourceforge.net/).
It's one of several projects where the goal is to support very large
Lucene indexes via distributed shards. Solr has also added federated
search support.
Interesting.
On Wed, Apr 1, 2009 at 17:42, Ken Krugler kkrugler_li...@transpac.comwrote:
On Fri, 2009-03-13 at 19:42 -0700, buddha1021 wrote:
hi dennis:
...
I am confident that hadoop can process the large datas of the www
search
engine! But lucene? I am afraid of the limited size of lucene's
On Fri, 2009-03-13 at 19:42 -0700, buddha1021 wrote:
hi dennis:
...
I am confident that hadoop can process the large datas of the www search
engine! But lucene? I am afraid of the limited size of lucene's index per
server is very little ,10G? or 30G? this is not enough for the www search
On Fri, 2009-03-20 at 11:55 +0200, Doğacan Güney wrote:
Hi,
On Sat, Mar 14, 2009 at 02:19, Dennis Kubes ku...@apache.org wrote:
...
Since there are different purposes for different users, would it be good to
consider moving Nutch to a top level apache project out from under the
Lucene
Hey there,
Just chiming in that we use the complete Nutch + Hadoop + Lucene stack
-- we download pages, index them for keywords, and then do heavy
Semantic Parsing on it to produce BI data. We also use a lot of
plug-ins for parsing and ranking information.
What we don't use is the 'built-in GUI
Guys,
I thought I'd chime in here. I don't have a lot of time tonight (long day
out here in California), but perhaps I can add more thoughts tomorrow.
My +1 for moving Nutch into a TLP. With a 1.0 release, and several prior
releases (~10), I think that the discussion is reasonable. I also tend
Hi,
On Sat, Mar 14, 2009 at 02:19, Dennis Kubes ku...@apache.org wrote:
With the release of Nutch 1.0 I think it is a good time to begin a
discussion about the future of Nutch. Here are some things to consider and
would love to here everyones views on this
Nutch's original intention was as
I actually use Nutch as a large scale search engine on two products. I think a
few things that would be nice to have are built in options to produce an
incremental index and maybe a quartz scheduler to automate it completely.
One thing that would be nice is when one of us figures something
Dennis, Otis et al,
My very small team has kept silent for a long time. We've been playing
with Nutch, Hadoop and to a lesser extent Solr for about 2 years now.
Before I get into my thoughts on what direction things should take I
would like to offer a thought on why Nutch is not as active as
Marc,
Glad you responded. Always good to hear peoples thoughts.
Marc Boucher wrote:
Dennis, Otis et al,
My very small team has kept silent for a long time. We've been playing
with Nutch, Hadoop and to a lesser extent Solr for about 2 years now.
Before I get into my thoughts on what direction
Dennis,
That adds another dimension to the issue which I had not considered.
One avenue as you suggest would be to add another committer to the
Lucene PMC. If that does not work them maybe going the route of TLP is
the best option.
Marc
Part of this is about releases. Currently releases are
Hello,
Comments inlined.
- Original Message
From: Dennis Kubes ku...@apache.org
To: nutch-user@lucene.apache.org
Sent: Friday, March 13, 2009 8:19:37 PM
With the release of Nutch 1.0 I think it is a good time to begin a discussion
about the future of Nutch. Here are some
I just wish there could be some clear documentation for Nutch/Solr
integration publicly available. Or some developers are already working on
this?
- Tony
On Mon, Mar 16, 2009 at 6:50 PM, Otis Gospodnetic ogjunk-nu...@yahoo.comwrote:
Hello,
Comments inlined.
- Original Message
Hi:
I also agree that the most usage scenarios of nutch are in vertical search
area. and in some unusual case users may don't even use nutch indexing at
all. they just crawl some pages as mirror purpose. and in some cases of
vertical search, user only need a fraction of pages, e.g. house rent
I am using Nutch for more than four years now, as a vertical search engine,
having indexed, some times, over one million pages. On the other hand, I
dont know nothing about programming and some specialized aplications. Words
like solr and others are like aliens for me. I am just interested
I think that this would be the case for making Nutch a top level
Apache Project. So that you can publish the framework and a complete
app but still tie it all together. Because personally I think that is
the strength of Nutch, that you can use it right out of the box,
without
Dennis,
I am with you, I am building a large scale www search engine. But
might also build a vertical search as well. Aren't the requirements
the same for building a large scale www search, against building a
vertical www search, the only thing that seems to change is the scope.
I like
22 matches
Mail list logo