Eric Schoen created TIKA-4485:
---------------------------------

             Summary: Can no longer use a simple file name or resource name for 
tika-config.xml
                 Key: TIKA-4485
                 URL: https://issues.apache.org/jira/browse/TIKA-4485
             Project: Tika
          Issue Type: Bug
          Components: tika-core
            Reporter: Eric Schoen


It looks like the tika.config property or TIKA_CONFIG environment variable can 
no longer be a simple string.  Post [this 
commit|https://github.com/apache/tika/commit/3e3145eb0a003ddc0e1ebf88c2ad35eebda2afb1],
 the code to get a config input stream does

 
{code:java}
new URI(config).toURL().openStream() {code}
 

and catches IOException and MalformedURLException.  But the above ends up 
throwing IllegalArgumentException if the property or environment variable isn't 
an absolute URL.  As a result, the code that tries to load the file as a 
resource or regular file never gets to run:
{code:java}
private static InputStream getConfigInputStream(String config, ServiceLoader 
serviceLoader)
            throws TikaException, IOException {
        InputStream stream = null;
        try {
            stream = new URI(config).toURL().openStream();
        } catch (IOException | URISyntaxException ignore) {
        }
        if (stream == null) {
            stream = serviceLoader.getResourceAsStream(config);
        }
        if (stream == null) {
            Path file = Paths.get(config);
            if (Files.isRegularFile(file)) {
                stream = Files.newInputStream(file);
            }
        }
        if (stream == null) {
            throw new TikaException("Specified Tika configuration not found: " 
+ config);
        }
        return stream;
    }{code}
 

If the first exception handler caught IllegalArgumentException, I think this 
would help.  This is an issue for test environments that can't inject 
-Dtika.config= ... as an absolute URI.  For my case, this is Clojure code 
running under leiningen where we are injecting JVM opts that need to be 
portable between machines (e.g., -Dtika.config=tika-config.xml).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to