[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770476#comment-17770476
]
ASF GitHub Bot commented on NUTCH-2959:
---------------------------------------
tballison commented on PR #776:
URL: https://github.com/apache/nutch/pull/776#issuecomment-1741039619
With the update to Tika 2.9.1-SNAPSHOT, I get 85 failed parses, most of them
are either encrypted documents or "can't retrieve Tika Parser for x"
[parse-segment-error-parsing.txt](https://github.com/apache/nutch/files/12767926/parse-segment-error-parsing.txt)
There is still one NoSuchMethodError, also from commons-io, but this time
UnsynchronizedByteArrayInputStream$Builder: at
org.apache.tika.parser.pdf.PDFEncodedStringDecoder.decode(PDFEncodedStringDecoder.java:85)
> Upgrade to Apache Tika 2.9.0
> ----------------------------
>
> Key: NUTCH-2959
> URL: https://issues.apache.org/jira/browse/NUTCH-2959
> Project: Nutch
> Issue Type: Task
> Affects Versions: 1.19
> Reporter: Markus Jelsma
> Priority: Major
> Fix For: 1.20
>
> Attachments: NUTCH-2959.patch
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)