[ 
https://issues.apache.org/jira/browse/TIKA-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816898#comment-13816898
 ] 

Dave Kincaid commented on TIKA-1192:
------------------------------------

I think I've tracked down the source of the defect and have a potential fix. 
I'd like someone with a little more RTF knowledge verify it however.

The file that is throwing the exception during parse has the following text in 
the header:
{noformat}
{\list\listtemplateid67698707
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext 
\'02\'00.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext 
\'02\'01.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext 
\'02\'02.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext 
\'02\'03.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext 
\'02\'04.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext 
\'02\'05.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext 
\'02\'06.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext 
\'02\'07.}{\levelnumbers \'01}}
{\listlevel\levelnfc0\levelfollow0\levelstartat1{\leveltext 
\'02\'08.}{\levelnumbers \'01}}
{\listname List1382206859_1;}\listid1991665056
}
}
{\*\listoverridetable
{\listoverride\listid1991665054\listoverridecount9
{\lfolevel\listoverrideformat\listoverridestartat
{\listlevel\levelnfc23\levelfollow0\levelstartat1{\leveltext 
\'01\'b7}{\levelnumbers}\f2\fcs1\f2\af2\fcs0\rtlch\f2\af2\ltrch}
}
{noformat}

what's happening is that in TextExtractor.java line 867 the listTableLevel 
counter isn't being reset to -1 when it goes into the list override table, so 
it gets incremented to 9 which is not valid.

I think the fix is to change the conditional on line 1072 to reset 
listTableLevel to -1 on equals("listoverride") in addition to equals("list"). 
I'll try to attach a patch file against the source for 1.4.

> ArrayIndexOutOfBoundsException: 9 parsing RTF
> ---------------------------------------------
>
>                 Key: TIKA-1192
>                 URL: https://issues.apache.org/jira/browse/TIKA-1192
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.4
>            Reporter: Dave Kincaid
>              Labels: rtf
>
> When trying to parse an RTF file I'm getting the following exception. I am 
> not able to attach the file for privacy reasons:
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 9
>                            TextExtractor.java:872 
> org.apache.tika.parser.rtf.TextExtractor.processControlWord
>                            TextExtractor.java:566 
> org.apache.tika.parser.rtf.TextExtractor.parseControlWord
>                            TextExtractor.java:492 
> org.apache.tika.parser.rtf.TextExtractor.parseControlToken
>                            TextExtractor.java:459 
> org.apache.tika.parser.rtf.TextExtractor.extract
>                            TextExtractor.java:448 
> org.apache.tika.parser.rtf.TextExtractor.extract
>                                 RTFParser.java:56 
> org.apache.tika.parser.rtf.RTFParser.parse
>                                  (Unknown Source) 
> sun.reflect.NativeMethodAccessorImpl.invoke0
>                  NativeMethodAccessorImpl.java:57 
> sun.reflect.NativeMethodAccessorImpl.invoke
>              DelegatingMethodAccessorImpl.java:43 
> sun.reflect.DelegatingMethodAccessorImpl.invoke
>                                   Method.java:606 
> java.lang.reflect.Method.invoke
>                                 Reflector.java:93 
> clojure.lang.Reflector.invokeMatchingMethod
>                                 Reflector.java:28 
> clojure.lang.Reflector.invokeInstanceMethod
>                                tika_parser.clj:20 rtf-parser.tika-parser/parse
>                form-init2921349737948661927.clj:1 
> rtf-parser.tika-parser/eval4200
>                                Compiler.java:6619 clojure.lang.Compiler.eval
>                                Compiler.java:6582 clojure.lang.Compiler.eval
>                                     core.clj:2852 clojure.core/eval
>                                      main.clj:259 clojure.main/repl[fn]
>                                      main.clj:259 clojure.main/repl[fn]
>                                      main.clj:277 clojure.main/repl[fn]
>                                      main.clj:277 clojure.main/repl
>                                  RestFn.java:1096 clojure.lang.RestFn.invoke
>                         interruptible_eval.clj:56 
> clojure.tools.nrepl.middleware.interruptible-eval/evaluate[fn]
>                                      AFn.java:159 
> clojure.lang.AFn.applyToHelper
>                                      AFn.java:151 clojure.lang.AFn.applyTo
>                                      core.clj:617 clojure.core/apply
>                                     core.clj:1788 clojure.core/with-bindings*
>                                   RestFn.java:425 clojure.lang.RestFn.invoke
>                         interruptible_eval.clj:41 
> clojure.tools.nrepl.middleware.interruptible-eval/evaluate
>                        interruptible_eval.clj:171 
> clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval[fn]
>                                     core.clj:2330 clojure.core/comp[fn]
>                        interruptible_eval.clj:138 
> clojure.tools.nrepl.middleware.interruptible-eval/run-next[fn]
>                                       AFn.java:24 clojure.lang.AFn.run
>                      ThreadPoolExecutor.java:1145 
> java.util.concurrent.ThreadPoolExecutor.runWorker
>                       ThreadPoolExecutor.java:615 
> java.util.concurrent.ThreadPoolExecutor$Worker.run
>                                   Thread.java:724 java.lang.Thread.run
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to