I have unsuccessfully tried to build tika-main on windows 10 on jdk8 for
several weeks. Here's the failures I get
[ERROR] Failures:
[ERROR]
TikaResourceFetcherTest.testHeader:100->CXFTestBase.assertContains:65
hello world not found in:
==> expected: <true> but was: <false>
[ERROR]
TikaResourceFetcherTest.testQueryPart:108->CXFTestBase.assertContains:65
hello world not found in:
==> expected: <true> but was: <false>
[ERROR] Errors:
[ERROR] TikaServerIntegrationTest.test1WayTLS:341->configure1WayTLS:456
» InvalidPath Illegal char <"> at index 0:
"XXX\tika-main\tika-server\tika-server-core\target\test-classes\ssl-keys\tika-client-truststore.p12"
[ERROR] TikaServerIntegrationTest.test2WayTLS:377->configure2WayTLS:428
» InvalidPath Illegal char <"> at index 0:
"XXX\tika-main\tika-server\tika-server-core\target\test-classes\ssl-keys\tika-client-keystore.p12"
[INFO]
[ERROR] Tests run: 70, Failures: 2, Errors: 2, Skipped: 4
The two TLS fails are new, but the TikaResourceFetcherTest have been for
weeks. The reason is that response.getEntity() returns an empty string.
This is because response.getEntity() is a ByteArrayInputStream that is
empty.
One output is this:
INFO [main] 14:23:17,531
org.apache.tika.pipes.fetcher.fs.FileSystemFetcher A FileSystemFetcher
(fsf) has been initialized. Clients will be able to read all files under
'XXX\tika-main\tika-server\tika-server-core\XXXtika-maintika-servertika-server-coretargettest-classestest-documents'
if this process has permission to read them.
Note that the two XXX here are the same. It's the Window path where I
keep my java projects.
I investigated a bit... FetcherManager.load loads a file from the temp
directory. Its content is like this:
<?xml version="1.0" encoding="UTF-8"?>
... license...
<properties>
<fetchers>
<fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher">
<params>
<name>fsf</name>
<basePath>XXXtika-maintika-servertika-server-coretargettest-classestest-documents</basePath>
</params>
</fetcher>
</fetchers>
...
Something goes wrong in
configXML = configXML.replaceAll("\\$\\{FETCHER_BASE_PATH\\}",
inputDir.toAbsolutePath().toString());
in TikaResourceFetcherTest.java that the backslash from the path is lost.
The javadoc warns about this
Note that backslashes (|\|) and dollar signs (|$|) in the
replacement string may cause the results to be different than if it were
being treated as a literal replacement string
using replace("${FETCHER_BASE_PATH}") fixes this.
Related: shouldn't FileSystemFetcher.checkInitialization() check whether
the path exists?
Tilman