On Fri, Aug 13, 2021 at 10:15 AM Fatih Pazarbasi (Jira) <[email protected]>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/TIKA-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398797#comment-17398797
> ]
>
> Fatih Pazarbasi commented on TIKA-3523:
> ---------------------------------------
>
> Ok I'll stay in the safe 1.27 zone for now because I couldn't manage to
> make 2.0.0 work the old way. Tika saved me from a lot of troubles.
>
> I hope you'll be able and willing to add a "put fileUrl to /tika"
> alternative in some near future releases.
>
> (*)So I'm sending you all good energies for this already amazing work.(*)
>
> In the meanwhile I'll learn some Java and see if I can put out some work
> for the gcp fetcher.
>
> Thanks and have a great weekend.
>
> > A replacement for enableFileUrl or Support for Google Cloud
> > -----------------------------------------------------------
> >
> >                 Key: TIKA-3523
> >                 URL: https://issues.apache.org/jira/browse/TIKA-3523
> >             Project: Tika
> >          Issue Type: Wish
> >          Components: tika-server
> >    Affects Versions: 2.0.0
> >            Reporter: Fatih Pazarbasi
> >            Priority: Minor
> >
> > Hello,
> > I have a setup where users upload their files to a cloud bucket and I
> forward the fileUrl to make ocr on them in a serverless cloud instance. I
> do it this way so the users do not contact with the Tika Server and I have
> a copy of what they've sent to process it. Also they have nothing to do
> with the unprocessed response.
> > Now that you've removed the enableFileUrl... I have to download the
> files to the backend instance from the cloud bucket they have uploaded
> their files to, and put them to /tika server back again...
> > I tried the following config.xml to work around the situation but it was
> in vain...
> >   For the made up url: [
> https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf|https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/
> ]
> > {code:java}
> > <fetchers>
> >  <fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher">
> >   <params>
> >    <name>fsf</name>
> >    <basePath>
> https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o</basePath>
>
> >   </params>
> >  </fetcher>
> > </fetchers>
> > <emitters>
> >  <emitter class="org.apache.tika.pipes.emitter.fs.FileSystemEmitter">
> >   <params>
> >    <name>fse</name>
> >    <basePath>gs://abcd-efgh.appspot.com/users</basePath>
> >   </params>
> >  </emitter>
> > </emitters>
> > <server>
> >  <params>
> >   <enableUnsecureFeatures>true</enableUnsecureFeatures>
> >  </params>
> > </server>
> > <pipes>
> >  <params>
> >   <tikaConfig>/path/to/tika-config.xml</tikaConfig>
> >  </params>
> > </pipes>{code}
> > {code:java}
> > headers: {
> > Accept: 'text/plain',
> > 'User-Agent': 'Firebase Functions',
> > fetcherName: 'fsf',
> > fetchKey: 'somefilethatdoesnotexist.pdf',
> > },{code}
> > It doesn't support the gs:// Google Storage bucket either. I have all
> the necessary permissions but it didn't help. I'm using a dockerized
> version of tika server, so the file System does not seem to be my concern...
> >
> >  In the golden times of 1.2x Iwas simply using:
> >
> > {code:java}
> > headers: {
> > Accept: 'text/plain',
> > 'User-Agent': 'Firebase Functions',
> > fileUrl: '
> https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf',
>
> > },{code}
> >
> >
> >  Am I missing something? If not my wish is that can you please make it
> so that fetchName is the definitive  first part of the old fileUrl and
> fetchKey is the specific pointer to a file?
> > This way I have control over the urls that's been sent to tika server to
> some extend, unlike enableFileUrl and also eat my cake without creating
> extra traffic on the backend by downloading from the bucket and uploading
> to tika.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>

Reply via email to