Hiring a Nutch Developer

2005-11-04 Thread Nathan Gwilliam
We're looking for a Nutch developer we can hire to build a nutch search engine for our sites. Are any of you doing side projects? Nathan Gwilliam Adoption.com Families.com [EMAIL PROTECTED]

Re: Hiring a Nutch Developer

2005-11-04 Thread Arun Kumar Sharma
Hi Nathan, Please send me more details Nathan Gwilliam [EMAIL PROTECTED] wrote: We're looking for a Nutch developer we can hire to build a nutch search engine for our sites. Are any of you doing side projects? Nathan Gwilliam Adoption.com Families.com [EMAIL PROTECTED] WITH WARM

Re: Hiring a Nutch Developer

2005-11-04 Thread Nathan Gwilliam
I actually have several projects, but let's start with the first. We need to create a search engine that crawls about 20 adoption-related sites that we are affiliated with, such as: adoption.com fosterparenting.com crisispregnancy.com adoption.org adopting.org 123adoption.com (which includes

nutch cluster questions.

2005-11-04 Thread Arsen Popovyan
At the moment we are using nutch-nightly (nutch-2005-07-20). We are not pleased with productivity of fetching, parsing, indexing, analyzing and scoring... information. Now our spider retrieves approx 25,000 new results per day. All processes now running on one computer (machine) and we are

Re: nutch cluster questions.

2005-11-04 Thread Stefan Groschupf
Please do not cross post questions! Checkout the map reduce branche in the svn. The map reduce will do all what you are looking for and it works well for me. Stefan Am 04.11.2005 um 14:32 schrieb Arsen Popovyan: At the moment we are using nutch-nightly (nutch-2005-07-20). We are not

[jira] Created: (NUTCH-123) Cache.jsp some times generate NullPointerException

2005-11-04 Thread JIRA
Cache.jsp some times generate NullPointerException -- Key: NUTCH-123 URL: http://issues.apache.org/jira/browse/NUTCH-123 Project: Nutch Type: Bug Components: web gui Environment: All systems Reporter:

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the same host? How many hosts are you running this on? Doug

Re: mapred questions

2005-11-04 Thread Doug Cutting
Ken van Mulder wrote: First is that the fetcher slows down over time and continues to use more and more memory as it goes (which I think is eventually hanging the process). What parser plugins do you have enabled? These are usually the culprit. Try using 'kill -QUIT' to see what various

[jira] Updated: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS

2005-11-04 Thread Paul Baclace (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-116?page=all ] Paul Baclace updated NUTCH-116: --- Attachment: required_by_TestNDFS_v3.patch I found and fixed a problem with a standalone DataNode process exiting too early (this was not detected by the current

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote: Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote: Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the

[jira] Updated: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS

2005-11-04 Thread Paul Baclace (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-116?page=all ] Paul Baclace updated NUTCH-116: --- Attachment: comments_msgs_and_local_renames_during_TestNDFS.patch TestNDFS a JUnit test specifically for NDFS ---

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: There is only a single datanode and there are 20 hosts. That's a lot of load on one datanode. I typically run a datanode on every host, accessing the local drives on that host. Doug

[jira] Created: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt

2005-11-04 Thread Doug Cutting (JIRA)
protocol-httpclient does not follow redirects when fetching robots.txt -- Key: NUTCH-124 URL: http://issues.apache.org/jira/browse/NUTCH-124 Project: Nutch Type: Bug Components: fetcher

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. A crash of any datanode would take down the entire system (no data replication since it's a common data-store in the end). Reducing it to a single datanode did not have this

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote: Rod Taylor wrote: I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. A crash of any datanode would take down the entire system (no data replication since it's a common data-store in

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 22:57 -0500, Rod Taylor wrote: On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote: Rod Taylor wrote: I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. A crash of any datanode would take down the entire system

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: Here you go. local filesystem and a single job tracker on another machine. When the tasktracker and jobtracker are on the same box there isn't a problem. When they are on different machines it runs into issues. This is using mapred.local.dir on the local machine (not sharedd

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote: Rod Taylor wrote: Here you go. local filesystem and a single job tracker on another machine. When the tasktracker and jobtracker are on the same box there isn't a problem. When they are on different machines it runs into issues.

RE: Halloween Joke at Google

2005-11-04 Thread Fuad Efendi
Andrzej, I am trying to restore human-oriented web-site tree using anchor text! As a samle, page with anchor text Motherboards has many linked pages with concrete motherboards, etc; we can group information in many cases. Anchor text is the true subject of the page, but within same domain. BTW,