On Fri, Aug 13, 2021 at 10:15 AM Fatih Pazarbasi (Jira) <[email protected]> wrote:
> > [ > https://issues.apache.org/jira/browse/TIKA-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398797#comment-17398797 > ] > > Fatih Pazarbasi commented on TIKA-3523: > --------------------------------------- > > Ok I'll stay in the safe 1.27 zone for now because I couldn't manage to > make 2.0.0 work the old way. Tika saved me from a lot of troubles. > > I hope you'll be able and willing to add a "put fileUrl to /tika" > alternative in some near future releases. > > (*)So I'm sending you all good energies for this already amazing work.(*) > > In the meanwhile I'll learn some Java and see if I can put out some work > for the gcp fetcher. > > Thanks and have a great weekend. > > > A replacement for enableFileUrl or Support for Google Cloud > > ----------------------------------------------------------- > > > > Key: TIKA-3523 > > URL: https://issues.apache.org/jira/browse/TIKA-3523 > > Project: Tika > > Issue Type: Wish > > Components: tika-server > > Affects Versions: 2.0.0 > > Reporter: Fatih Pazarbasi > > Priority: Minor > > > > Hello, > > I have a setup where users upload their files to a cloud bucket and I > forward the fileUrl to make ocr on them in a serverless cloud instance. I > do it this way so the users do not contact with the Tika Server and I have > a copy of what they've sent to process it. Also they have nothing to do > with the unprocessed response. > > Now that you've removed the enableFileUrl... I have to download the > files to the backend instance from the cloud bucket they have uploaded > their files to, and put them to /tika server back again... > > I tried the following config.xml to work around the situation but it was > in vain... > > For the made up url: [ > https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf|https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/ > ] > > {code:java} > > <fetchers> > > <fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher"> > > <params> > > <name>fsf</name> > > <basePath> > https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o</basePath> > > > </params> > > </fetcher> > > </fetchers> > > <emitters> > > <emitter class="org.apache.tika.pipes.emitter.fs.FileSystemEmitter"> > > <params> > > <name>fse</name> > > <basePath>gs://abcd-efgh.appspot.com/users</basePath> > > </params> > > </emitter> > > </emitters> > > <server> > > <params> > > <enableUnsecureFeatures>true</enableUnsecureFeatures> > > </params> > > </server> > > <pipes> > > <params> > > <tikaConfig>/path/to/tika-config.xml</tikaConfig> > > </params> > > </pipes>{code} > > {code:java} > > headers: { > > Accept: 'text/plain', > > 'User-Agent': 'Firebase Functions', > > fetcherName: 'fsf', > > fetchKey: 'somefilethatdoesnotexist.pdf', > > },{code} > > It doesn't support the gs:// Google Storage bucket either. I have all > the necessary permissions but it didn't help. I'm using a dockerized > version of tika server, so the file System does not seem to be my concern... > > > > In the golden times of 1.2x Iwas simply using: > > > > {code:java} > > headers: { > > Accept: 'text/plain', > > 'User-Agent': 'Firebase Functions', > > fileUrl: ' > https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf', > > > },{code} > > > > > > Am I missing something? If not my wish is that can you please make it > so that fetchName is the definitive first part of the old fileUrl and > fetchKey is the specific pointer to a file? > > This way I have control over the urls that's been sent to tika server to > some extend, unlike enableFileUrl and also eat my cake without creating > extra traffic on the backend by downloading from the bucket and uploading > to tika. > > > > -- > This message was sent by Atlassian Jira > (v8.3.4#803005) >
