Thanks for the insight.

My interest (as a developer) in TikaJAXRS is that it provides a nice
encapsulation of Tika functionality which is accessible across language
boundaries. The fact that it can then also cross network boundaries is of
secondary importance to me.

I'm developing code in C++ and I'd like to be able to access Tika's
capabilities.

The TikaJAXRS offers an easy way in. If the fileURL functionality was in
place and running TikaJAXRS on the same box as the Client and restricted to
listening on 127.0.0.1 with the file:// check as well, this would limit some
of the dangers listed below - an attacker would then need access to your
host box itself in which case you would have already lost.

My main concern in accessing the Tika libraries via TikaJAXRS is the
performance overheads associated with going through sockets (and possible
the additional memory/file copying of file data if fileUrl is not
available).

Short of the Herculean task of porting the entirety of Tika from java to
C++, are there any better, well-established, more performant ways of
interfacing to Tika from C++ to the java Tika code ?


Regards,

John

-----Original Message-----
From: Nick Burch [mailto:[email protected]] 
Sent: 13 September 2016 15:34
To: John Dougrez-Lewis
Cc: [email protected]
Subject: RE: Query on correct use of 'fileUrl' in TikaJAXRS Server to
extract document at remote url - my request is not working

On Tue, 13 Sep 2016, John Dougrez-Lewis wrote:
> Surely the security vulnerability could have been fixed by disallowing 
> "file://" variants in the URL rather than removing the feature altogether?
>
> Or were there other implementation issues relating to the fileUrl 
> feature that meant it was best removed ?

As the fetch is done by the server, it could allow you to fetch documents
that you as a user couldn't see/access/reach but the server could. It also
has some denial of service risks too, plus doesn't have things you want from
a web spider like pools / limits / robots.txt acceptance etc.

Nick

Reply via email to