[jira] [Commented] (TIKA-3524) Add google cloud storage to pipes modules

Tim Allison (Jira) Tue, 22 Feb 2022 10:27:05 -0800


    [ 
https://issues.apache.org/jira/browse/TIKA-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496265#comment-17496265
 ]


Tim Allison commented on TIKA-3524:
-----------------------------------

An example config file is here: 
https://github.com/apache/tika/blob/main/tika-pipes/tika-fetchers/tika-fetcher-gcs/src/test/resources/tika-config-gcs.xml

As with the URL fetcher, you'll have to add the gcs-fetcher jar 
(https://mvnrepository.com/artifact/org.apache.tika/tika-pipes-iterator-gcs/2.3.0)
 to your classpath, e.g. put the tika-server-standard and 
tika-pipes-iterator-gcs-2.3.0.jar in a tika-bin directory and start the server 
with something like:

java -cp "tika-bin/*"

We do this over on the http fetcher example, too.

Then, when you call tika-server, you'll specify similar things as with the http 
fetcher: 
curl -X PUT http://localhost:9998/tika -H "fetcherName: gcs" -H 
"fetchKey:path/to/file.pdf"

where the path/to/file.pdf is the path in GCS under the bucket + projectId 
specified in the tika-config file.

> Add google cloud storage to pipes modules
> -----------------------------------------
>
>                 Key: TIKA-3524
>                 URL: https://issues.apache.org/jira/browse/TIKA-3524
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 2.1.0
>
>
> This should be fairly straightforward w s3 as an example for adding a 
> fetcher, emitter and pipesiterator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (TIKA-3524) Add google cloud storage to pipes modules

Reply via email to