sandeshkr419 commented on PR #1130:
URL: https://github.com/apache/tika/pull/1130#issuecomment-2007939574
@gastaldi Thanks for the quick revert.
These are the present tika libraries that I'm consuming:
```
versions << [
'tika' : '2.6.0',
'commonscompress' : '1.24.0'
.
.
.
api "org.apache.tika:tika-core:${versions.tika}"
api "org.apache.tika:tika-parsers:${versions.tika}"
api "org.apache.tika:tika-parsers-standard-package:${versions.tika}"
api "org.apache.tika:tika-langdetect-optimaize:${versions.tika}"
api "org.apache.commons:commons-compress:${versions.commonscompress}
```
**With tika version:2.6.0, and commons-compress 1.24.0:**
Everything worked fine.
**With tika version:2.6.0, and commons-compress 1.26.0:**
IWorkerParser related parsing methods started throwing exceptions:
```
org.opensearch.ingest.attachment.TikaDocTests > testFiles FAILED
java.lang.RuntimeException: parsing of filename: testKeynote.key failed
at
__randomizedtesting.SeedInfo.seed([7E30995C8CE0CC1:6EFE6C139A13FF43]:0)
at
org.opensearch.ingest.attachment.TikaDocTests.assertParseable(TikaDocTests.java:85)
at
org.opensearch.ingest.attachment.TikaDocTests.testFiles(TikaDocTests.java:71)
Caused by:
org.apache.tika.exception.TikaException: TIKA-198: Illegal
IOException from org.apache.tika.parser.iwork.IWorkPackageParser@3ba82e1d
at
app//org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304)
at
app//org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:195)
at app//org.apache.tika.Tika.parseToString(Tika.java:525)
at
app//org.opensearch.ingest.attachment.TikaImpl.lambda$parse$0(TikaImpl.java:122)
at
[email protected]/java.security.AccessController.doPrivileged(AccessController.java:714)
at
app//org.opensearch.ingest.attachment.TikaImpl.parse(TikaImpl.java:121)
at
app//org.opensearch.ingest.attachment.TikaDocTests.assertParseable(TikaDocTests.java:80)
... 1 more
Caused by:
java.io.IOException: Resetting to invalid mark
at
java.base/java.io.BufferedInputStream.implReset(BufferedInputStream.java:583)
at
java.base/java.io.BufferedInputStream.reset(BufferedInputStream.java:569)
at
org.apache.tika.parser.iwork.IWorkPackageParser.parse(IWorkPackageParser.java:97)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
... 7 more
```
**With tika version:2.8.0, and commons-compress 1.26.0:**
The following dependencies fail to resolve:
```
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:94:
error: package org.apache.tika.parser.html does not exist
new org.apache.tika.parser.html.HtmlParser(),
^
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:95:
error: package org.apache.tika.parser.pdf does not exist
new org.apache.tika.parser.pdf.PDFParser(),
^
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:96:
error: package org.apache.tika.parser.txt does not exist
new org.apache.tika.parser.txt.TXTParser(),
^
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:97:
error: package org.apache.tika.parser.microsoft.rtf does not exist
new org.apache.tika.parser.microsoft.rtf.RTFParser(),
^
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:98:
error: package org.apache.tika.parser.microsoft does not exist
new org.apache.tika.parser.microsoft.OfficeParser(),
^
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:99:
error: package org.apache.tika.parser.microsoft does not exist
new org.apache.tika.parser.microsoft.OldExcelParser(),
^
/Users/kusandes/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:100:
error: package org.apache.tika.parser.microsoft.ooxml does not exist
ParserDecorator.withoutTypes(new
org.apache.tika.parser.microsoft.ooxml.OOXMLParser(), EXCLUDES),
^
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:101:
error: package org.apache.tika.parser.odf does not exist
new org.apache.tika.parser.odf.OpenDocumentParser(),
^
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:102:
error: package org.apache.tika.parser.iwork does not exist
new org.apache.tika.parser.iwork.IWorkPackageParser(),
^
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:103:
error: package org.apache.tika.parser.xml does not exist
new org.apache.tika.parser.xml.DcXMLParser(),
^
/Users/--/workplace/opensearch/OpenSearch/plugins/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/TikaImpl.java:104:
error: package org.apache.tika.parser.epub does not exist
new org.apache.tika.parser.epub.EpubParser(), };
^
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]