[ 
https://issues.apache.org/jira/browse/TIKA-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489780#comment-17489780
 ] 

Fatih Pazarbasi commented on TIKA-3523:
---------------------------------------

Hello again.

I need to say that tika-config.xml solution keeps giving me errors. 
{panel:title=tika-config.xml}
 
<?xml version="1.0" encoding="UTF-8" ?>
 
<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser"/>
    </parsers>
    <fetchers>
        <fetcher class="org.apache.tika.pipes.fetcher.http.HttpFetcher">
            <params>
                <name>http</name>
            </params>
        </fetcher>
    </fetchers>
    <server>
        <params>
            <enableUnsecureFeatures>true</enableUnsecureFeatures>
        </params>
    </server>
</properties>
 
{panel}
 
 

With this 
[https://cwiki.apache.org/confluence/display/TIKA/tika-pipes+and+Docker] ... 
This error

{{}}
{code:java}
java.nio.file.NoSuchFileException: C:/Program
at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at 
java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
at java.base/java.nio.file.Files.newByteChannel(Files.java:380)
at java.base/java.nio.file.Files.newByteChannel(Files.java:432)
at 
java.base/java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
at java.base/java.nio.file.Files.newInputStream(Files.java:160)
at org.apache.tika.server.core.TikaServerConfig.load(TikaServerConfig.java:176)
at org.apache.tika.server.core.TikaServerConfig.load(TikaServerConfig.java:134)
at org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:83)
at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66)
ERROR [main] 19:53:04,124 org.apache.tika.server.core.TikaServerCli Can't 
start: 
java.nio.file.NoSuchFileException: C:/Program
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at 
sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
 ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:380) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:432) ~[?:?]
at 
java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
 ~[?:?]
at java.nio.file.Files.newInputStream(Files.java:160) ~[?:?]
at org.apache.tika.server.core.TikaServerConfig.load(TikaServerConfig.java:176) 
~[tika-server-standard-2.0.0.jar:2.0.0]
at org.apache.tika.server.core.TikaServerConfig.load(TikaServerConfig.java:134) 
~[tika-server-standard-2.0.0.jar:2.0.0]
at org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:83) 
~[tika-server-standard-2.0.0.jar:2.0.0]
at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66) 
[tika-server-standard-2.0.0.jar:2.0.0]
{code}
{{}}


and with apache/tika 2.2.1... This error:
 
 

{{}}
{code:java}
INFO  [main] 19:24:09,290 org.apache.tika.server.core.TikaServerProcess 
Starting Apache Tika 2.2.1 server
INFO  [main] 19:24:09,384 org.apache.tika.server.core.TikaServerProcess Using 
custom config: /tika-config.xml
ERROR [main] 19:24:09,495 org.apache.tika.server.core.TikaServerProcess Can't 
start: 
org.apache.tika.exception.TikaConfigException: problem loading fetcher
at org.apache.tika.config.ConfigBase.buildClass(ConfigBase.java:203) 
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.config.ConfigBase.loadComposite(ConfigBase.java:178) 
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.config.ConfigBase.buildComposite(ConfigBase.java:151) 
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.config.ConfigBase.buildComposite(ConfigBase.java:132) 
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.pipes.fetcher.FetcherManager.load(FetcherManager.java:42) 
~[tika-server-standard-2.2.1.jar:2.2.1]
at 
org.apache.tika.server.core.TikaServerProcess.initServer(TikaServerProcess.java:214)
 ~[tika-server-standard-2.2.1.jar:2.2.1]
at 
org.apache.tika.server.core.TikaServerProcess.main(TikaServerProcess.java:125) 
[tika-server-standard-2.2.1.jar:2.2.1]
Caused by: java.lang.ClassNotFoundException: 
org.apache.tika.pipes.fetcher.http.HttpFetcher
at 
jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) 
~[?:?]
at 
jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
 ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:520) ~[?:?]
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:375) ~[?:?]
at org.apache.tika.config.ConfigBase.buildClass(ConfigBase.java:195) 
~[tika-server-standard-2.2.1.jar:2.2.1]
... 6 more
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Failed to 
start forked process -- forked is not alive
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:116)
at org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:88)
at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66)
Caused by: java.lang.RuntimeException: Failed to start forked process -- forked 
is not alive
at 
org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.<init>(TikaServerWatchDog.java:306)
at 
org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.<init>(TikaServerWatchDog.java:269)
at 
org.apache.tika.server.core.TikaServerWatchDog.startForkedProcess(TikaServerWatchDog.java:209)
at 
org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:143)
at 
org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:53)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
ERROR [main] 19:24:09,582 org.apache.tika.server.core.TikaServerCli Can't 
start: 
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Failed to 
start forked process -- forked is not alive
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
at org.apache.tika.server.core.TikaServerCli.mainLoop(TikaServerCli.java:116) 
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.server.core.TikaServerCli.execute(TikaServerCli.java:88) 
~[tika-server-standard-2.2.1.jar:2.2.1]
at org.apache.tika.server.core.TikaServerCli.main(TikaServerCli.java:66) 
[tika-server-standard-2.2.1.jar:2.2.1]
Caused by: java.lang.RuntimeException: Failed to start forked process -- forked 
is not alive
at 
org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.<init>(TikaServerWatchDog.java:306)
 ~[tika-server-standard-2.2.1.jar:2.2.1]
at 
org.apache.tika.server.core.TikaServerWatchDog$ForkedProcess.<init>(TikaServerWatchDog.java:269)
 ~[tika-server-standard-2.2.1.jar:2.2.1]
at 
org.apache.tika.server.core.TikaServerWatchDog.startForkedProcess(TikaServerWatchDog.java:209)
 ~[tika-server-standard-2.2.1.jar:2.2.1]
at 
org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:143)
 ~[tika-server-standard-2.2.1.jar:2.2.1]
at 
org.apache.tika.server.core.TikaServerWatchDog.call(TikaServerWatchDog.java:53) 
~[tika-server-standard-2.2.1.jar:2.2.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) 
~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 
~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 
~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
{code}
I frankly don't know what to do. And how to get this this thing accept URL's.

{{}}

> A replacement for enableFileUrl or Support for Google Cloud
> -----------------------------------------------------------
>
>                 Key: TIKA-3523
>                 URL: https://issues.apache.org/jira/browse/TIKA-3523
>             Project: Tika
>          Issue Type: Wish
>          Components: tika-server
>    Affects Versions: 2.0.0
>            Reporter: Fatih Pazarbasi
>            Priority: Minor
>
> Hello,
> I have a setup where users upload their files to a cloud bucket and I forward 
> the fileUrl to make ocr on them in a serverless cloud instance. I do it this 
> way so the users do not contact with the Tika Server and I have a copy of 
> what they've sent to process it. Also they have nothing to do with the 
> unprocessed response.
> Now that you've removed the enableFileUrl... I have to download the files to 
> the backend instance from the cloud bucket they have uploaded their files to, 
> and put them to /tika server back again...
> I tried the following config.xml to work around the situation but it was in 
> vain...
>   For the made up url: 
> [https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf|https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/]
> {code:java}
> <fetchers> 
>  <fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher"> 
>   <params> 
>    <name>fsf</name> 
>    
> <basePath>https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o</basePath>
>  
>   </params> 
>  </fetcher> 
> </fetchers> 
> <emitters> 
>  <emitter class="org.apache.tika.pipes.emitter.fs.FileSystemEmitter"> 
>   <params> 
>    <name>fse</name> 
>    <basePath>gs://abcd-efgh.appspot.com/users</basePath> 
>   </params> 
>  </emitter> 
> </emitters> 
> <server> 
>  <params> 
>   <enableUnsecureFeatures>true</enableUnsecureFeatures> 
>  </params> 
> </server> 
> <pipes> 
>  <params> 
>   <tikaConfig>/path/to/tika-config.xml</tikaConfig> 
>  </params> 
> </pipes>{code}
> {code:java}
> headers: {         
> Accept: 'text/plain',         
> 'User-Agent': 'Firebase Functions',         
> fetcherName: 'fsf',         
> fetchKey: 'somefilethatdoesnotexist.pdf',   
> },{code}
> It doesn't support the gs:// Google Storage bucket either. I have all the 
> necessary permissions but it didn't help. I'm using a dockerized version of 
> tika server, so the file System does not seem to be my concern...
>   
>  In the golden times of 1.2x Iwas simply using:
>   
> {code:java}
> headers: {               
> Accept: 'text/plain',               
> 'User-Agent': 'Firebase Functions',               
> fileUrl: 
> 'https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf',
>              
> },{code}
>  
>   
>  Am I missing something? If not my wish is that can you please make it so 
> that fetchName is the definitive  first part of the old fileUrl and fetchKey 
> is the specific pointer to a file?
> This way I have control over the urls that's been sent to tika server to some 
> extend, unlike enableFileUrl and also eat my cake without creating extra 
> traffic on the backend by downloading from the bucket and uploading to tika. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to