Copilot commented on code in PR #2825:
URL: https://github.com/apache/tika/pull/2825#discussion_r3274642095
##########
tika-server/tika-server-standard/pom.xml:
##########
@@ -191,6 +198,36 @@
</execution>
</executions>
</plugin>
+ <plugin>
+ <groupId>org.apache.maven.plugins</groupId>
+ <artifactId>maven-install-plugin</artifactId>
+ <executions>
+ <!--
+ With <attach>false</attach> on the assembly above (TIKA-4733) the
+ -bin.zip is not part of the project artifact set and so is neither
+ deployed to Central nor installed locally. Sibling reactor modules
+ declare tika-server-standard:bin:zip as a Maven dep, so install it
+ into the local repo at its canonical coordinates to satisfy reactor
+ resolution without publishing it to Central.
+ -->
+ <execution>
+ <id>install-server-bin-zip-locally</id>
+ <phase>install</phase>
Review Comment:
The `install-file` execution is bound to the `install` phase, but CI runs
the E2E workflow with `mvn -pl tika-e2e-tests -am clean verify -Pe2e` (verify
does not reach install). With `<attach>false</attach>`, the
`tika-server-standard:bin:zip` artifact will therefore be unavailable for
`maven-dependency-plugin:unpack` in `tika-e2e-tests/tika-server` during
`process-test-resources`, breaking the E2E build. Bind the `install-file`
execution to a phase included by `verify` (e.g., `package`), or otherwise
ensure the classifier artifact is resolvable from the reactor during `verify`
(e.g., conditional attach for reactor builds).
##########
tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java:
##########
@@ -298,17 +298,43 @@ private static void async(String[] args) throws Exception
{
}
if (runpack || ! StringUtils.isBlank(tikaConfigPath)) {
- TikaAsyncCLI.main(args);
+ invokeAsyncCLI(args);
return;
}
if (args.length == 1 && args[0].endsWith(".json")) {
- TikaAsyncCLI.main(args);
+ invokeAsyncCLI(args);
return;
}
// For batch mode (two directories), pass directly to TikaAsyncCLI.
// It will create its own config with PluginsWriter that includes
// plugin-roots, fetcher, emitter, and pipes-iterator configuration.
- TikaAsyncCLI.main(args);
+ invokeAsyncCLI(args);
+ }
+
+ /**
+ * Invokes the batch/async processor ({@code tika-async-cli}). The async
+ * processor and the parsers it forks live in the {@code lib/} directory of
+ * the tika-app distribution rather than inside the bare {@code
tika-app.jar}.
+ * If tika-app is run as a standalone jar (without the surrounding unzipped
+ * distribution), the supporting classes are missing from the classpath and
+ * the JVM throws {@link NoClassDefFoundError}. Translate that into an
+ * actionable message rather than letting the raw error escape.
+ *
+ * @see <a
href="https://issues.apache.org/jira/browse/TIKA-4733">TIKA-4733</a>
+ */
+ private static void invokeAsyncCLI(String[] args) throws Exception {
+ try {
+ TikaAsyncCLI.main(args);
+ } catch (NoClassDefFoundError e) {
+ System.err.println("Error: could not load the Tika batch/async
processor (" +
+ e.getMessage() + ").");
+ System.err.println("Batch mode requires the full tika-app
distribution, not the "
+ + "standalone jar.");
+ System.err.println("Download tika-app-<version>.zip, unzip it, and
run "
+ + "tika-app-<version>.jar from inside the unzipped
directory so that the "
+ + "adjacent 'lib/' and 'plugins/' directories are on the
classpath.");
Review Comment:
The message says the adjacent `lib/` and `plugins/` directories are "on the
classpath". In the distribution, `lib/` jars are referenced via the jar
manifest Class-Path, but `plugins/` is not on the JVM classpath (it’s a
filesystem directory used for plugin discovery). Consider rewording to avoid
the classpath implication (e.g., say they must be present alongside the jar /
in the distribution directory).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]