Re: About Apache Nutch 1.1 Final Release

Phil Barnett Tue, 13 Apr 2010 23:11:32 -0700

On Sat, Apr 10, 2010 at 11:04 PM, Phil Barnett <ph...@philb.us> wrote:


> On Sat, 2010-04-10 at 18:22 +0200, Andrzej Bialecki wrote:
> > On 2010-04-10 17:49, Phil Barnett wrote:
> > > On Thu, 2010-04-08 at 21:31 -0700, Mattmann, Chris A (388J) wrote:
> > >> Hi there,
> > >>
> > >> Well as soon as we have 3 +1 binding VOTEs. Right now I'm the only PMC
> member that's VOTE'd +1 on the release.
> > >>
> > >> Hopefully in the next few days someone will have a chance to check...
> > >
> > > I tried to get the Release Candidate (latest nightly build) running
> > > yesterday and I ran into problems with both of the scripts that I use
> to
> > > crawl with 1.0.
> > >
> > > But the smaller bin/crawl method finished the crawl and then
> immediately
> > > had a java exception when starting the next step.
> > >
> > > Sorry I don't have more specifics, but I'm at home, the setup is at
> work
> > > and I had to revert to get things back running. But I built a dev
> > > machine so I can play with 1.1 and get more specific.
> >
> > More details on this (your environment, OS, JDK version) and
> > logs/stacktraces would be highly appreciated! You mentioned that you
> > have some scripts - if you could extract relevant portions from them (or
> > copy the scripts) it would help us to ensure that it's not a simple
> > command-line error.
>
> Will do, Monday.
>
> Basics.
>
> HP DL-360 G4 Dual Xeon, 4G ram, Mirrored SCSI.
>
> Fresh install of CentOS 5.4
>
> Java from Sun.
>
> ant from repository, compiled from nightly build.
>
> I'll try to get you more details Monday evening. I'm driving down to
> work tonight to get the -dev machine running so I'll have something to
> break on Monday. ;-)
>
> Wow, it's been a brutal week at work so far. I did manage to get the dev
server up and managed to try again to crawl. This is a full from scratch
install.

I'm seeing two things.

1. When I run bin/nutch crawl, it finds the seed site and spiders it. When I
run deepcrawl it never finds anything. They both use the same seed
directory.

2. During bin/nutch crawl, I get a null pointer exception in function main
right after it decides it has crawled the last page. From memory, it was
line 133.

The logs/hadoop.log file doesn't show anything of merit.

I started documenting exactly what was going on but I worked from 9 am to
12:30 am working some nasty network problems and I never got it gathered up.

I will be able to get it to you tomorrow. Sorry for the delay.

Phil Barnett
Senior Analyst
Walt Disney World.

Re: About Apache Nutch 1.1 Final Release

Reply via email to