[ 
https://issues.apache.org/jira/browse/JCR-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598336#comment-17598336
 ] 

Chris Poulsen commented on JCR-4832:
------------------------------------

We are upgrading from 2.16.3 in order to get everything running on Java 17. 
We're not indexing anything but plain JCR properties (no files etc.) so things 
seem to work in our case (queries return data etc.) - I guess that we're not 
really relying on what Tika brings to the table.

I did some experimentation locally on the 2.20.x branch:
 * First shot was trying to disable the failing Tika {{Detector}} that lead to 
the {{DefaultDetector}} not being instantiated by adjusting the bundled 
{{tika-config.xml}}. This was not possible as Tika tries to load the excluded 
class in order to exclude it? - So the link failure remains.
 * Second attempt was adding things directly to the JCA archive until the error 
disappeared, this worked - But is not really sustainable IMO (someone has to 
manually maintain version parity with the expected versions from the 
{{tika-parsers-standard-package}} dependency.

The dependencies I had to re-add to get things going was:
{code:java}
apache-mime4j-core-0.8.4.jar (no deps)
commons-compress-1.21.jar (no deps)
metadata-extractor-2.18.0.jar (pulls xmlcore)
xmpcore-6.1.11.jar (no deps){code}
I do not really know why there is a lot of excludes on the 
{{tika-parsers-standard-package}} in the parent pom, but as far as I can tell 
that package registers some of the failing detectors with the {{ServiceLoader}} 
(META-INF/services) so Tika tries to load them and fails, as there are link 
errors.

I tried commenting out the 3 excludes from the parent-pom - Then a valid 
archive was produced, but there was some (16?) test failures in other modules.

The best I can come up with would be to either not exclude required 
dependencies in the parent pom - or drop the {{tika-parsers-standard-package}} 
dependency (as it registers some things in the {{ServiceLoader}} that can't be 
loaded due to link errors) and include the relevant transitive dependencies 
directly instead.

Without the excludes in the parent pom, the JCA archive could just bundle 
{{tika-parsers-standard-package}} and be done with it. I haven't used maven 
intensively for more than a decade, so maybe there is a way to tell the JCA 
build to ignore the excludes from the parent pom?

If the Tika folks would just use the FQCN strings for excludes instead of doing:
{code:java}
Class.forName(<FQCN>);
{code}
It would also pan out - But it is not really good to have dependencies register 
their stuff in the {{ServiceLoader}} and then not include the dependencies 
needed for loading the registered class, to it is probably not a realistic 
expectation.

I can add a patch for "exploding" the dependencies of the 
{{tika-parsers-standard-package}} in the JCR module - Currently it seems that 
there are "only" 3 direct dependencies that are not following the Tika version 
([mvnrepository|https://mvnrepository.com/artifact/org.apache.tika/tika-parsers-standard-package/2.4.1])
 - But it feels like adding extra stuff to maintain when up-stream changes.

Does anyone have some suggestions on how to solve this in a good way?

> commons-compress jar seems to be missing in JCA archive
> -------------------------------------------------------
>
>                 Key: JCR-4832
>                 URL: https://issues.apache.org/jira/browse/JCR-4832
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-jca
>    Affects Versions: 2.20.6
>            Reporter: Chris Poulsen
>            Priority: Major
>         Attachments: tika-commons-compress.txt
>
>
> I'm trying to upgrade to Jackrabbit v2.20.6 as part of a larger dependency 
> update.
> We use the JCA archive because we need XA transactions.
> During deployment of the 2.20.6 archive I see stacktraces with:
> {code:java}
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.commons.compress.compressors.CompressorException {code}
> (More complete log attached)
> As far as I can tell, you include:
> {code:java}
> org.apache.tika:tika-parser-zip-commons:2.4.0{code}
> Which in turn has a dependency on:
> {code:java}
> org.apache.commons:commons-compress:1.21 {code}
> And the latter is not present in the JCA archive.
> I do not think that we are relying much on the Tika stuff, but I am a bit 
> worried that something is not working correctly based on all these stack 
> traces.
> I guess the missing dependency is a bug.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to