[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099859#comment-17099859
]
Tim Allison commented on TIKA-3094:
-----------------------------------
Hi [~bob], I'll take #3.
On 2, if you comment out the following in master, that's the triggering file:
{noformat}
needToFix.add("testAccess2_encrypted.accdb");
{noformat}
You should be able to reproduce it at least in 8. I _think_ I got it in both 8
and 11 last night, but may be mistaken. Wait, y, I got it in at least 8 last
night, and I can reproduce in 11 this morning.
{noformat}
java.lang.ClassNotFoundException: javax.xml.bind.JAXBException not found by
org.apache.tika.bundle [19]
at
org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1639)
at
org.apache.felix.framework.BundleWiringImpl.access$200(BundleWiringImpl.java:80)
at
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:2053)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
at
com.healthmarketscience.jackcess.impl.office.AgileEncryptionProvider.<init>(AgileEncryptionProvider.java:70)
at
com.healthmarketscience.jackcess.impl.OfficeCryptCodecHandler.create(OfficeCryptCodecHandler.java:89)
at
com.healthmarketscience.jackcess.CryptCodecProvider.createHandler(CryptCodecProvider.java:116)
at
com.healthmarketscience.jackcess.impl.PageChannel.initialize(PageChannel.java:105)
at
com.healthmarketscience.jackcess.impl.DatabaseImpl.<init>(DatabaseImpl.java:554)
at
com.healthmarketscience.jackcess.impl.DatabaseImpl.open(DatabaseImpl.java:415)
at
com.healthmarketscience.jackcess.DatabaseBuilder.open(DatabaseBuilder.java:267)
at
org.apache.tika.parser.microsoft.JackcessParser.parse(JackcessParser.java:95)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
{noformat}
> Apache Tika fails to extract text for pptx extension.
> -----------------------------------------------------
>
> Key: TIKA-3094
> URL: https://issues.apache.org/jira/browse/TIKA-3094
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.24, 1.24.1
> Reporter: Abhishek Chauhan
> Assignee: Bob Paulin
> Priority: Critical
> Attachments: Sample PPT.pptx
>
>
> This is regressed from 1.23 version of Apache Tika. Text extraction for .pptx
> ententions which was earlier working with Apache Tika 1.23 is no longer
> working in 1.24 version.
> For .ppt extention it is working fine in both 1.23 and 1.24
>
> As I referred to release notes [https://tika.apache.org/1.24/index.html], you
> have updated the POI to 4.1.2. That might be the root cause of this problem.
> POI requires [https://mvnrepository.com/artifact/com.zaxxer/SparseBitSet/1.2]
> which is not present in bundle I guess.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)