[
https://issues.apache.org/jira/browse/SOLR-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620437#comment-14620437
]
Tim Allison edited comment on SOLR-7764 at 7/9/15 12:41 PM:
------------------------------------------------------------
1) Right. That's the problem with how Tika is currently being used within DIH.
If it hangs, you'll never get an exception. If the xlsx file is causing the
hang and given the vintage of Tika you're using, it might be a custom fraction
format (TIKA-1132)???
2) ummm...the pdcidfont issue sounds like a pdf problem, not an Excel problem.
Does the excel file have an embedded PDF? Will send email privately.
3) I can't quite tell what behavior you'd like. Please give more info.
As a side note, for debugging purposes, you might try grabbing the relevant
version of tika-app, and dropping potential problem files into that. If it
hangs, you've found your problem.
Another option is to run tika-app ( >= 1.8) in batch mode against an input
directory. If your logging is set up correctly, you'll be able to tell which
file caused the hang. The commandline for that is: java -jar tika-app-xx.jar
-i <inputdir> -o <outputdir>, but see the tika-batch
[wiki|http://wiki.apache.org/tika/TikaBatchUsage] for advanced usage on
configuring logging. (well see it in about 10 minutes after I update it. ;) )
was (Author: [email protected]):
1) Right. That's the problem with how Tika is currently being used within DIH.
If it hangs, you'll never get an exception. If the xlsx file is causing the
hang and given the vintage of Tika you're using, it might be a custom fraction
format (TIKA-1132)???
2) ummm...the pdcidfont issue sounds like a pdf problem, not an Excel problem.
Does the excel file have an embedded PDF? Will send email privately.
3) I can't quite tell what behavior you'd like. Please give more info.
> Solr indexing hangs if encounters an certain XML parse error
> ------------------------------------------------------------
>
> Key: SOLR-7764
> URL: https://issues.apache.org/jira/browse/SOLR-7764
> Project: Solr
> Issue Type: Bug
> Components: query parsers
> Affects Versions: 4.7.2
> Environment: Ubuntu 12.04.5 LTS
> Reporter: Sorin Gheorghiu
> Labels: indexing
> Attachments: Solr_XML_parse_error_080715.txt
>
>
> BlueSpice (http://bluespice.com/) uses Solr to index documents for the
> 'Extended search' feature.
> Solr hangs if during indexing certain error occurs:
> 8.7.2015 15:34:26
> ERROR
> SolrCore
> org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: XML parse error
> 8.7.2015 15:34:26
> ERROR
> SolrDispatchFilter
> null:org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: XML parse error
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]