Re: question about error handling during indexing

Maciej Liżewski Mon, 10 Sep 2012 07:20:37 -0700

Ok. thanks for explanation. ignoreTikaException should do the trick (I will
check that).



2012/9/10 Karl Wright <[email protected]>

> Usually in these situations Solr returns a 500 error.  The Solr
> Connector, at one point, used to retry indefinitely when such an error
> came back, but I believe there were changes to this logic and now it
> may well abort the job if this happens for more than a few hours
> straight.  This is because the Solr connector has no way of knowing
> whether the 500 error is due to just a Tika exception on a single
> document, or something more fundamental being wrong with your Solr
> configuration.
>
> The big problem is that Solr should not be returning a 500 error just
> because Tika is unhappy with the document.  I believe there is a Solr
> ticket that describes the problem and requests different handling; you
> may be able to find it.
>
> Karl
>
>
> On Mon, Sep 10, 2012 at 9:47 AM, Maciej Liżewski
> <[email protected]> wrote:
> > Hi,
> >
> > I have found situation when Solr throws exception that it is not able to
> > parse specified file, like this:
> > INFO: [collection1] webapp=/solr path=/update/extract
> > params={literal.deny_token_document=LDAPgroup:DEAD_AUTHORITY&literal.id
> =file://///XXXXX/YYYYmovie.mov&literal.allow_token_document=LDAPgroup:50071&literal.allow_token_document=LDAPgroup:group}
> > {} 0 269
> > 2012-09-10 15:34:50 org.apache.solr.common.SolrException log
> > SEVERE: org.apache.solr.common.SolrException:
> > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException
> from
> > org.apache.tika.parser.mp4.MP4Parser@48f9a4c1
> >         at
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:230)
> >         at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >         at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >         at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
> >         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
> >
> > Now - I can live with that, I do not expect it to index everything, but I
> > am not sure if Manifold should react the way it is - it just stops
> indexing
> > anything more from such job (and in fact it shuts down job execution)
> where
> > it should try to index other pending files... Now I must run indexing by
> > hand, check if everything is ok, when there is such problem - add proper
> > "exclude" filter (which leads to Manifold does not index this kind of
> files
> > at all, but problem could be with only this specific single file), and
> run
> > it again. Still - I have to guarantee that it won't fail in future on
> some
> > other file...
> >
> > Don't you think that Manifold should try to index everything *even* when
> > there are problems with indexing some documents?
> >
> > I am just not sure if this is bug or feature... :)
>

Re: question about error handling during indexing

Reply via email to