[
https://issues.apache.org/jira/browse/TIKA-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496265#comment-17496265
]
Tim Allison commented on TIKA-3524:
-----------------------------------
An example config file is here:
https://github.com/apache/tika/blob/main/tika-pipes/tika-fetchers/tika-fetcher-gcs/src/test/resources/tika-config-gcs.xml
As with the URL fetcher, you'll have to add the gcs-fetcher jar
(https://mvnrepository.com/artifact/org.apache.tika/tika-pipes-iterator-gcs/2.3.0)
to your classpath, e.g. put the tika-server-standard and
tika-pipes-iterator-gcs-2.3.0.jar in a tika-bin directory and start the server
with something like:
java -cp "tika-bin/*"
We do this over on the http fetcher example, too.
Then, when you call tika-server, you'll specify similar things as with the http
fetcher:
curl -X PUT http://localhost:9998/tika -H "fetcherName: gcs" -H
"fetchKey:path/to/file.pdf"
where the path/to/file.pdf is the path in GCS under the bucket + projectId
specified in the tika-config file.
> Add google cloud storage to pipes modules
> -----------------------------------------
>
> Key: TIKA-3524
> URL: https://issues.apache.org/jira/browse/TIKA-3524
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
> Fix For: 2.1.0
>
>
> This should be fairly straightforward w s3 as an example for adding a
> fetcher, emitter and pipesiterator.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)