Re: Problem with latest SVN during reduce phase

Byron Miller Fri, 13 Jan 2006 06:18:03 -0800

I'll pull it down today and give it a shot.

thanks,
-byron


--- Lukas Vlcek <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> Get the latest svn version. Andrzej commited some
> patches yesterday
> and now this issue is gone (at least it warks fine
> for me). I believe
> that revision# 368167 is what we were about.
> 
> Regards,
> Lukas
> 
> On 1/13/06, Pashabhai <[EMAIL PROTECTED]>
> wrote:
> > Hi ,
> >
> >    You are right, Parse object is not null even
> though
> > page has no content and title.
> >
> >    Could it be FetcherOutput Object ???
> >
> >
> > P
> >
> > --- Lukas Vlcek <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > > I think this issue can be more complex. If I
> > > remember my test
> > > correctly then parse object was not null. Also
> > > parse.getText() was not
> > > null (it just contained empty String).
> > > If document is not parsed correctly then "empty"
> > > parse is returned
> > > instead: parseStatus.getEmptyParse(); which
> should
> > > be OK, but I didn't
> > > have a chance to check if this can cause any
> > > troubles during index
> > > index optimization.
> > > Lukas
> > >
> > > On 1/12/06, Pashabhai <[EMAIL PROTECTED]>
> > > wrote:
> > > > Hi ,
> > > >
> > > >    The very similar exception occurs while
> > > indexing a
> > > > page which do not have body content (and title
> > > > sometimes).
> > > >
> > > > 051223 194717 Optimizing index.
> > > > java.lang.NullPointerException
> > > >         at
> > > >
> > >
> >
>
org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:75)
> > > >
> > > >         at
> > > >
> > >
> >
>
org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:63)
> > > >
> > > >         at
> > > >
> > >
> >
>
org.apache.nutch.crawl.Indexer.reduce(Indexer.java:217)
> > > >
> > > >         at
> > > >
> > >
> >
>
org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
> > > >
> > > >         at
> > > >
> > > >
> > > >  Looking into the source code of
> > > BasicIndexingFilter.
> > > > it is trying to
> > > > doc.add(Field.UnStored("content",
> > > parse.getText()));
> > > >
> > > > I guess adding check for null on parse object
> > > > if(parse!=null)   should solve the problem.
> > > >
> > > > Can confirm when tested locally.
> > > >
> > > > Thanks
> > > > P
> > > >
> > > >
> > > >
> > > >
> > > > --- Lukas Vlcek <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Hi,
> > > > > I am facing this error as well. Now I
> located
> > > one
> > > > > particular document
> > > > > which is causing it (it is msword document
> which
> > > > > can't be properly
> > > > > parsed by parser). I have sent it to Andrzej
> in
> > > > > separed email. Let's
> > > > > see if that helps...
> > > > > Lukas
> > > > >
> > > > > On 1/11/06, Dominik Friedrich
> > > > > <[EMAIL PROTECTED]> wrote:
> > > > > > I got this exception a lot, too. I haven't
> > > tested
> > > > > the patch by Andrzej
> > > > > > yet but instead I just put the doc.add()
> lines
> > > in
> > > > > the indexer reduce
> > > > > > function in a try-catch block . This way
> the
> > > > > indexing finishes even with
> > > > > > a null value and i can see which documents
> > > haven't
> > > > > been indexed in the
> > > > > > log file.
> > > > > >
> > > > > > Wouldn't it be a good idea to catch every
> > > > > exceptions that only affect
> > > > > > one document in loops like this? At least
> I
> > > don't
> > > > > like it if an indexing
> > > > > > process dies after a few hours because one
> > > > > document triggers such an
> > > > > > exception.
> > > > > >
> > > > > > best regards,
> > > > > > Dominik
> > > > > >
> > > > > > Byron Miller wrote:
> > > > > > > 60111 103432 reduce > reduce
> > > > > > > 060111 103432 Optimizing index.
> > > > > > > 060111 103433 closing > reduce
> > > > > > > 060111 103434 closing > reduce
> > > > > > > 060111 103435 closing > reduce
> > > > > > > java.lang.NullPointerException: value
> cannot
> > > be
> > > > > null
> > > > > > >         at
> > > > > > >
> > > > >
> > > >
> > >
> >
>
org.apache.lucene.document.Field.<init>(Field.java:469)
> > > > > > >         at
> > > > > > >
> > > > >
> > > >
> > >
> >
>
org.apache.lucene.document.Field.<init>(Field.java:412)
> > > > > > >         at
> > > > > > >
> > > > >
> > > >
> > >
> >
>
org.apache.lucene.document.Field.UnIndexed(Field.java:195)
> > > > > > >         at
> > > > > > >
> > > > >
> > > >
> > >
> >
>
org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198)
> > > > > > >         at
> > > > > > >
> > > > >
> > > >
> > >
> >
>
org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
> > > > > > >         at
> > > > > > >
> > > > >
> > > >
> > >
> >
>
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90)
> > > > > > > Exception in thread "main"
> > > java.io.IOException:
> > > > > Job
> > > > > > > failed!
> > > > > > >         at
> > > > > > >
> 
=== message truncated ===

Re: Problem with latest SVN during reduce phase

Reply via email to