[
https://issues.apache.org/jira/browse/TIKA-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489780#comment-17489780
]
Fatih Pazarbasi commented on TIKA-3523:
---------------------------------------
Hello again.
I need to say that tika-config.xml solution keeps giving me errors.
{panel:title=tika-config.xml}
<?xml version="1.0" encoding="UTF-8" ?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
</parsers>
<fetchers>
<fetcher class="org.apache.tika.pipes.fetcher.http.HttpFetcher">
<params>
<name>http</name>
</params>
</fetcher>
</fetchers>
<server>
<params>
<enableUnsecureFeatures>true</enableUnsecureFeatures>
</params>
</server>
</properties>
{panel}
With this
[https://cwiki.apache.org/confluence/display/TIKA/tika-pipes+and+Docker] ...
This error
{{}}
{code:java}
java.nio.file.NoSuchFileException: C:/Program
at
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at
java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
at java.base/java.nio.file.Files.newByteChannel(Files.java:380)
at java.base/java.nio.file.Files.newByteChannel(Files.java:432)
at
java.base/java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
at java.base/java.nio.file.Files.newInputStream(Files.java:160)
at org.apache.tika.server.core.TikaServerConfig.load(TikaServerConfig.java:176)
at org.apache.tika.server.core.TikaServerConfig.load(TikaServerConfig.java:134)
at org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:83)
at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66)
ERROR [main] 19:53:04,124 org.apache.tika.server.core.TikaServerCli Can't
start:
java.nio.file.NoSuchFileException: C:/Program
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at
sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:380) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:432) ~[?:?]
at
java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
~[?:?]
at java.nio.file.Files.newInputStream(Files.java:160) ~[?:?]
at org.apache.tika.server.core.TikaServerConfig.load(TikaServerConfig.java:176)
~[tika-server-standard-2.0.0.jar:2.0.0]
at org.apache.tika.server.core.TikaServerConfig.load(TikaServerConfig.java:134)
~[tika-server-standard-2.0.0.jar:2.0.0]
at org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:83)
~[tika-server-standard-2.0.0.jar:2.0.0]
at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66)
[tika-server-standard-2.0.0.jar:2.0.0]
{code}
{{}}
and with apache/tika 2.2.1... This error:
{{}}
{code:java}
INFO [main] 19:24:09,290 org.apache.tika.server.core.TikaServerProcess
Starting Apache Tika 2.2.1 server
INFO [main] 19:24:09,384 org.apache.tika.server.core.TikaServerProcess Using
custom config: /tika-config.xml
ERROR [main] 19:24:09,495 org.apache.tika.server.core.TikaServerProcess Can't
start:
org.apache.tika.exception.TikaConfigException: problem loading fetcher
at org.apache.tika.config.ConfigBase.buildClass(ConfigBase.java:203)
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.config.ConfigBase.loadComposite(ConfigBase.java:178)
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.config.ConfigBase.buildComposite(ConfigBase.java:151)
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.config.ConfigBase.buildComposite(ConfigBase.java:132)
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.pipes.fetcher.FetcherManager.load(FetcherManager.java:42)
~[tika-server-standard-2.2.1.jar:2.2.1]
at
org.apache.tika.server.core.TikaServerProcess.initServer(TikaServerProcess.java:214)
~[tika-server-standard-2.2.1.jar:2.2.1]
at
org.apache.tika.server.core.TikaServerProcess.main(TikaServerProcess.java:125)
[tika-server-standard-2.2.1.jar:2.2.1]
Caused by: java.lang.ClassNotFoundException:
org.apache.tika.pipes.fetcher.http.HttpFetcher
at
jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
~[?:?]
at
jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:520) ~[?:?]
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:375) ~[?:?]
at org.apache.tika.config.ConfigBase.buildClass(ConfigBase.java:195)
~[tika-server-standard-2.2.1.jar:2.2.1]
... 6 more
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Failed to
start forked process -- forked is not alive
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:116)
at org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:88)
at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66)
Caused by: java.lang.RuntimeException: Failed to start forked process -- forked
is not alive
at
org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.<init>(TikaServerWatchDog.java:306)
at
org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.<init>(TikaServerWatchDog.java:269)
at
org.apache.tika.server.core.TikaServerWatchDog.startForkedProcess(TikaServerWatchDog.java:209)
at
org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:143)
at
org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:53)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
ERROR [main] 19:24:09,582 org.apache.tika.server.core.TikaServerCli Can't
start:
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Failed to
start forked process -- forked is not alive
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
at org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:116)
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:88)
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66)
[tika-server-standard-2.2.1.jar:2.2.1]
Caused by: java.lang.RuntimeException: Failed to start forked process -- forked
is not alive
at
org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.<init>(TikaServerWatchDog.java:306)
~[tika-server-standard-2.2.1.jar:2.2.1]
at
org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.<init>(TikaServerWatchDog.java:269)
~[tika-server-standard-2.2.1.jar:2.2.1]
at
org.apache.tika.server.core.TikaServerWatchDog.startForkedProcess(TikaServerWatchDog.java:209)
~[tika-server-standard-2.2.1.jar:2.2.1]
at
org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:143)
~[tika-server-standard-2.2.1.jar:2.2.1]
at
org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:53)
~[tika-server-standard-2.2.1.jar:2.2.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
{code}
I frankly don't know what to do. And how to get this this thing accept URL's.
{{}}
> A replacement for enableFileUrl or Support for Google Cloud
> -----------------------------------------------------------
>
> Key: TIKA-3523
> URL: https://issues.apache.org/jira/browse/TIKA-3523
> Project: Tika
> Issue Type: Wish
> Components: tika-server
> Affects Versions: 2.0.0
> Reporter: Fatih Pazarbasi
> Priority: Minor
>
> Hello,
> I have a setup where users upload their files to a cloud bucket and I forward
> the fileUrl to make ocr on them in a serverless cloud instance. I do it this
> way so the users do not contact with the Tika Server and I have a copy of
> what they've sent to process it. Also they have nothing to do with the
> unprocessed response.
> Now that you've removed the enableFileUrl... I have to download the files to
> the backend instance from the cloud bucket they have uploaded their files to,
> and put them to /tika server back again...
> I tried the following config.xml to work around the situation but it was in
> vain...
> For the made up url:
> [https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf|https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/]
> {code:java}
> <fetchers>
> <fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher">
> <params>
> <name>fsf</name>
>
> <basePath>https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o</basePath>
>
> </params>
> </fetcher>
> </fetchers>
> <emitters>
> <emitter class="org.apache.tika.pipes.emitter.fs.FileSystemEmitter">
> <params>
> <name>fse</name>
> <basePath>gs://abcd-efgh.appspot.com/users</basePath>
> </params>
> </emitter>
> </emitters>
> <server>
> <params>
> <enableUnsecureFeatures>true</enableUnsecureFeatures>
> </params>
> </server>
> <pipes>
> <params>
> <tikaConfig>/path/to/tika-config.xml</tikaConfig>
> </params>
> </pipes>{code}
> {code:java}
> headers: {
> Accept: 'text/plain',
> 'User-Agent': 'Firebase Functions',
> fetcherName: 'fsf',
> fetchKey: 'somefilethatdoesnotexist.pdf',
> },{code}
> It doesn't support the gs:// Google Storage bucket either. I have all the
> necessary permissions but it didn't help. I'm using a dockerized version of
> tika server, so the file System does not seem to be my concern...
>
> In the golden times of 1.2x Iwas simply using:
>
> {code:java}
> headers: {
> Accept: 'text/plain',
> 'User-Agent': 'Firebase Functions',
> fileUrl:
> 'https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf',
>
> },{code}
>
>
> Am I missing something? If not my wish is that can you please make it so
> that fetchName is the definitive first part of the old fileUrl and fetchKey
> is the specific pointer to a file?
> This way I have control over the urls that's been sent to tika server to some
> extend, unlike enableFileUrl and also eat my cake without creating extra
> traffic on the backend by downloading from the bucket and uploading to tika.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)