On Feb 2, 2011, at 5:18 AM, David Saile wrote:
Hi all,
I have a question concerning updating a site's score in Nutch 1.2.
In org.apache.nutch.crawlCrawlDbReducer's reduce-method I found a call to
scfilters.updateDbScore((Text)key, oldSet ? old : null, result,
linkList);
On May 5, 2011, at 11:50 AM, Julien Nioche wrote:
Tim,
We're about to commit the upgrade of SOLR in the trunk and this should be
released as 1.3 shortly. See https://issues.apache.org/jira/browse/NUTCH-983
Thanks for the update. I'll wait for the next release then and hold off on the
Seems like I just finish upgrading to Solr 3.2 and a new version is released!
Anyway, is the Solrj client shipping with Nutch 1.3 compatible with the new
Solr 3.3 release? Is there any reason from the Nutch end to hold off on
upgrading Solr?
Apologies for the fairly simple question, but if
On Jul 6, 2011, at 10:59 AM, Cam Bazz wrote:
Hello,
I am crawling multiple sites, in range of hundreds, with 256
concurrent threads, and 4 conns per site at a time.
It seems that if a site is having a bad day, all the threads slow
down, and this site basically clogs all the threads.
On Jul 6, 2011, at 11:09 AM, Markus Jelsma wrote:
Javabin version hasn't changed. You can use it.
Thanks for the quick answer. Solr 3.3 is working flawlessly with our Nutch 1.3
install.
Blessings,
TwP
On Wednesday 06 July 2011 18:59:55 Tim Pease wrote:
Seems like I just finish
Currently Nutch supports the meta name=robots content=noindex directive
in the head of individual pages. I would like to extend this feature to allow
the http.agent.name as a valid name in addition to the robots directive.
For example, in your nutch-site.xml file if you have the property
At the root of the Nutch 1.3 project, what is the magic ant incantation to run
only the tests for the plugin I'm currently hacking away on? I'm looking for
the command line syntax.
Blessings,
TwP
On Oct 4, 2011, at 4:03 AM, Danicela nutch wrote:
Hi,
I want to make a ScoringFilter plugin which will give priority to seeds file.
I mean, I have a crawdb and a seeds file with links, I set a topN=5 to test,
and I want that my seeds links are fetched first, before what I have in the
I've made some modifications to Nutch to suite some requirements at work.
However, my changes have caused one of the JUnit tests to fail. The output from
running `ant test` is none too helpful. All it tells me is BUILD FAILED - good
luck scrolling through a thousand lines of output to find that
On Nov 28, 2011, at 10:38 PM, Tim Pease wrote:
I've made some modifications to Nutch to suite some requirements at work.
However, my changes have caused one of the JUnit tests to fail. The output
from running `ant test` is none too helpful. All it tells me is BUILD FAILED
- good luck
I've noticed that the mirrors only contain downloadable assets for Nutch 1.4.
Is there a location where older versions of Nutch can be downloaded?
Blessings,
TwP
I am in the process of writing a new Nutch tool that will index documents into
the ElasticSearch [http://www.elasticsearch.org/] search engine. Can and should
this tool be created as a plugin? Are there any examples of tools being created
as plugins?
More generally, how should a new tool be
On Dec 7, 2011, at 3:17 PM, Chip Calhoun wrote:
This is probably just down to my not waiting for a 1.4 tutorial, but here
goes. I've always used the following two commands to run my crawl and then
index to Solr:
# bin/nutch crawl urls -dir crawl -depth 1 -topN 50
# bin/nutch solrindex
13 matches
Mail list logo