I have unsuccessfully tried to build tika-main on windows 10 on jdk8 for several weeks. Here's the failures I get

[ERROR] Failures:
[ERROR] TikaResourceFetcherTest.testHeader:100->CXFTestBase.assertContains:65 hello world not found in:
 ==> expected: <true> but was: <false>
[ERROR] TikaResourceFetcherTest.testQueryPart:108->CXFTestBase.assertContains:65 hello world not found in:
 ==> expected: <true> but was: <false>
[ERROR] Errors:
[ERROR] TikaServerIntegrationTest.test1WayTLS:341->configure1WayTLS:456 » InvalidPath Illegal char <"> at index 0: "XXX\tika-main\tika-server\tika-server-core\target\test-classes\ssl-keys\tika-client-truststore.p12" [ERROR] TikaServerIntegrationTest.test2WayTLS:377->configure2WayTLS:428 » InvalidPath Illegal char <"> at index 0: "XXX\tika-main\tika-server\tika-server-core\target\test-classes\ssl-keys\tika-client-keystore.p12"
[INFO]
[ERROR] Tests run: 70, Failures: 2, Errors: 2, Skipped: 4

The two TLS fails are new, but the TikaResourceFetcherTest have been for weeks. The reason is that response.getEntity() returns an empty string. This is because response.getEntity() is a ByteArrayInputStream that is empty.


One output is this:


INFO  [main] 14:23:17,531 org.apache.tika.pipes.fetcher.fs.FileSystemFetcher A FileSystemFetcher (fsf) has been initialized. Clients will be able to read all files under 'XXX\tika-main\tika-server\tika-server-core\XXXtika-maintika-servertika-server-coretargettest-classestest-documents' if this process has permission to read them.

Note that the two XXX here are the same. It's the Window path where I keep my java projects.

I investigated a bit... FetcherManager.load loads a file from the temp directory. Its content is like this:

<?xml version="1.0" encoding="UTF-8"?>

... license...

<properties>
  <fetchers>
    <fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher">
      <params>
        <name>fsf</name>
<basePath>XXXtika-maintika-servertika-server-coretargettest-classestest-documents</basePath>
      </params>
    </fetcher>
  </fetchers>

...

Something goes wrong in

        configXML = configXML.replaceAll("\\$\\{FETCHER_BASE_PATH\\}",
                inputDir.toAbsolutePath().toString());

in TikaResourceFetcherTest.java that the backslash from the path is lost.

The javadoc warns about this

    Note that backslashes (|\|) and dollar signs (|$|) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string

using replace("${FETCHER_BASE_PATH}") fixes this.

Related: shouldn't FileSystemFetcher.checkInitialization() check whether the path exists?

Tilman

Reply via email to