Eric Schoen created TIKA-4485:
---------------------------------
Summary: Can no longer use a simple file name or resource name for
tika-config.xml
Key: TIKA-4485
URL: https://issues.apache.org/jira/browse/TIKA-4485
Project: Tika
Issue Type: Bug
Components: tika-core
Reporter: Eric Schoen
It looks like the tika.config property or TIKA_CONFIG environment variable can
no longer be a simple string. Post [this
commit|https://github.com/apache/tika/commit/3e3145eb0a003ddc0e1ebf88c2ad35eebda2afb1],
the code to get a config input stream does
{code:java}
new URI(config).toURL().openStream() {code}
and catches IOException and MalformedURLException. But the above ends up
throwing IllegalArgumentException if the property or environment variable isn't
an absolute URL. As a result, the code that tries to load the file as a
resource or regular file never gets to run:
{code:java}
private static InputStream getConfigInputStream(String config, ServiceLoader
serviceLoader)
throws TikaException, IOException {
InputStream stream = null;
try {
stream = new URI(config).toURL().openStream();
} catch (IOException | URISyntaxException ignore) {
}
if (stream == null) {
stream = serviceLoader.getResourceAsStream(config);
}
if (stream == null) {
Path file = Paths.get(config);
if (Files.isRegularFile(file)) {
stream = Files.newInputStream(file);
}
}
if (stream == null) {
throw new TikaException("Specified Tika configuration not found: "
+ config);
}
return stream;
}{code}
If the first exception handler caught IllegalArgumentException, I think this
would help. This is an issue for test environments that can't inject
-Dtika.config= ... as an absolute URI. For my case, this is Clojure code
running under leiningen where we are injecting JVM opts that need to be
portable between machines (e.g., -Dtika.config=tika-config.xml).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)