[
https://issues.apache.org/jira/browse/TIKA-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816898#comment-13816898
]
Dave Kincaid commented on TIKA-1192:
------------------------------------
I think I've tracked down the source of the defect and have a potential fix.
I'd like someone with a little more RTF knowledge verify it however.
The file that is throwing the exception during parse has the following text in
the header:
{noformat}
{\list\listtemplateid67698707
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext
\'02\'00.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext
\'02\'01.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext
\'02\'02.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext
\'02\'03.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext
\'02\'04.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext
\'02\'05.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext
\'02\'06.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext
\'02\'07.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext
\'02\'08.}{\levelnumbers \'01}}
{\listname List1382206859_1;}\listid1991665056
}
}
{\*\listoverridetable
{\listoverride\listid1991665054\listoverridecount9
{\lfolevel\listoverrideformat\listoverridestartat
{\listlevel\levelnfc23\levelfollow0\levelstartat1{\leveltext
\'01\'b7}{\levelnumbers}\f2\fcs1\f2\af2\fcs0\rtlch\f2\af2\ltrch}
}
{noformat}
what's happening is that in TextExtractor.java line 867 the listTableLevel
counter isn't being reset to -1 when it goes into the list override table, so
it gets incremented to 9 which is not valid.
I think the fix is to change the conditional on line 1072 to reset
listTableLevel to -1 on equals("listoverride") in addition to equals("list").
I'll try to attach a patch file against the source for 1.4.
> ArrayIndexOutOfBoundsException: 9 parsing RTF
> ---------------------------------------------
>
> Key: TIKA-1192
> URL: https://issues.apache.org/jira/browse/TIKA-1192
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.4
> Reporter: Dave Kincaid
> Labels: rtf
>
> When trying to parse an RTF file I'm getting the following exception. I am
> not able to attach the file for privacy reasons:
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 9
> TextExtractor.java:872
> org.apache.tika.parser.rtf.TextExtractor.processControlWord
> TextExtractor.java:566
> org.apache.tika.parser.rtf.TextExtractor.parseControlWord
> TextExtractor.java:492
> org.apache.tika.parser.rtf.TextExtractor.parseControlToken
> TextExtractor.java:459
> org.apache.tika.parser.rtf.TextExtractor.extract
> TextExtractor.java:448
> org.apache.tika.parser.rtf.TextExtractor.extract
> RTFParser.java:56
> org.apache.tika.parser.rtf.RTFParser.parse
> (Unknown Source)
> sun.reflect.NativeMethodAccessorImpl.invoke0
> NativeMethodAccessorImpl.java:57
> sun.reflect.NativeMethodAccessorImpl.invoke
> DelegatingMethodAccessorImpl.java:43
> sun.reflect.DelegatingMethodAccessorImpl.invoke
> Method.java:606
> java.lang.reflect.Method.invoke
> Reflector.java:93
> clojure.lang.Reflector.invokeMatchingMethod
> Reflector.java:28
> clojure.lang.Reflector.invokeInstanceMethod
> tika_parser.clj:20 rtf-parser.tika-parser/parse
> form-init2921349737948661927.clj:1
> rtf-parser.tika-parser/eval4200
> Compiler.java:6619 clojure.lang.Compiler.eval
> Compiler.java:6582 clojure.lang.Compiler.eval
> core.clj:2852 clojure.core/eval
> main.clj:259 clojure.main/repl[fn]
> main.clj:259 clojure.main/repl[fn]
> main.clj:277 clojure.main/repl[fn]
> main.clj:277 clojure.main/repl
> RestFn.java:1096 clojure.lang.RestFn.invoke
> interruptible_eval.clj:56
> clojure.tools.nrepl.middleware.interruptible-eval/evaluate[fn]
> AFn.java:159
> clojure.lang.AFn.applyToHelper
> AFn.java:151 clojure.lang.AFn.applyTo
> core.clj:617 clojure.core/apply
> core.clj:1788 clojure.core/with-bindings*
> RestFn.java:425 clojure.lang.RestFn.invoke
> interruptible_eval.clj:41
> clojure.tools.nrepl.middleware.interruptible-eval/evaluate
> interruptible_eval.clj:171
> clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval[fn]
> core.clj:2330 clojure.core/comp[fn]
> interruptible_eval.clj:138
> clojure.tools.nrepl.middleware.interruptible-eval/run-next[fn]
> AFn.java:24 clojure.lang.AFn.run
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor.runWorker
> ThreadPoolExecutor.java:615
> java.util.concurrent.ThreadPoolExecutor$Worker.run
> Thread.java:724 java.lang.Thread.run
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1#6144)