Hi,

I have found situation when Solr throws exception that it is not able to
parse specified file, like this:
INFO: [collection1] webapp=/solr path=/update/extract
params={literal.deny_token_document=LDAPgroup:DEAD_AUTHORITY&literal.id=file://///XXXXX/YYYYmovie.mov&literal.allow_token_document=LDAPgroup:50071&literal.allow_token_document=LDAPgroup:group}
{} 0 269
2012-09-10 15:34:50 org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
org.apache.tika.parser.mp4.MP4Parser@48f9a4c1
        at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:230)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)

Now - I can live with that, I do not expect it to index everything, but I
am not sure if Manifold should react the way it is - it just stops indexing
anything more from such job (and in fact it shuts down job execution) where
it should try to index other pending files... Now I must run indexing by
hand, check if everything is ok, when there is such problem - add proper
"exclude" filter (which leads to Manifold does not index this kind of files
at all, but problem could be with only this specific single file), and run
it again. Still - I have to guarantee that it won't fail in future on some
other file...

Don't you think that Manifold should try to index everything *even* when
there are problems with indexing some documents?

I am just not sure if this is bug or feature... :)

Reply via email to