[ 
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770476#comment-17770476
 ] 

ASF GitHub Bot commented on NUTCH-2959:
---------------------------------------

tballison commented on PR #776:
URL: https://github.com/apache/nutch/pull/776#issuecomment-1741039619

   With the update to Tika 2.9.1-SNAPSHOT, I get 85 failed parses, most of them 
are either encrypted documents or "can't retrieve Tika Parser for x"
   
[parse-segment-error-parsing.txt](https://github.com/apache/nutch/files/12767926/parse-segment-error-parsing.txt)
   
   
   There is still one NoSuchMethodError, also from commons-io, but this time 
UnsynchronizedByteArrayInputStream$Builder: at 
org.apache.tika.parser.pdf.PDFEncodedStringDecoder.decode(PDFEncodedStringDecoder.java:85)
   
   




> Upgrade to Apache Tika 2.9.0
> ----------------------------
>
>                 Key: NUTCH-2959
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2959
>             Project: Nutch
>          Issue Type: Task
>    Affects Versions: 1.19
>            Reporter: Markus Jelsma
>            Priority: Major
>             Fix For: 1.20
>
>         Attachments: NUTCH-2959.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to