[
https://issues.apache.org/jira/browse/TIKA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992821#comment-15992821
]
Nick Burch commented on TIKA-2351:
----------------------------------
Can you attach the failing document?
If not, could you try grabbing a recent nightly build of the standalone Tika
App, run your document through that in `--text` mode, and post the full
stacktrace of the failure?
> Getting error while parsing documents
> -------------------------------------
>
> Key: TIKA-2351
> URL: https://issues.apache.org/jira/browse/TIKA-2351
> Project: Tika
> Issue Type: Bug
> Components: general
> Affects Versions: 1.14
> Environment: Red Hat Enterprise Linux Server release 7.3
> ElasticSearch 5.2.1
> ingest-attachment 5.2.1
> Reporter: VENU
> Labels: starter
>
> Hi Everyone,
> I am using Ingest-attachment for indexing documents. I am able to parse text
> documents (.txt files). When I try to parse .doc or pdf files getting this
> error.
> FILE = /elastic/files/englishAnalyzer.doc
> ID = 6
> "error" : {
> "root_cause" : [
> {
> "type" : "exception",
> "reason" : "java.lang.IllegalArgumentException:
> ElasticsearchParseException[Error parsing document in field [data]]; nested:
> TikaExc
> eption[Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested:
> ArrayIndexOutOfBoundsException[-1];
> ",
> "header" : {
> "processor_type" : "attachment"
> }
> }
> ],
> "type" : "exception",
> "reason" : "java.lang.IllegalArgumentException:
> ElasticsearchParseException[Error parsing document in field [data]]; nested:
> TikaExcepti
> on[Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested:
> ArrayIndexOutOfBoundsException[-1];",
> "caused_by" : {
> "type" : "illegal_argument_exception",
> "reason" : "ElasticsearchParseException[Error parsing document in field
> [data]]; nested: TikaException[Unexpected RuntimeException fro
> m org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested:
> ArrayIndexOutOfBoundsException[-1];",
> "caused_by" : {
> "type" : "parse_exception",
> "reason" : "Error parsing document in field [data]",
> "caused_by" : {
> "type" : "tika_exception",
> "reason" : "Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@28992079",
> "caused_by" : {
> "type" : "array_index_out_of_bounds_exception",
> "reason" : "-1"
> }
> }
> }
> },
> "header" : {
> "processor_type" : "attachment"
> }
> },
> "status" : 500
> }
> Please help me to resolve the issue
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)