+1 to re-enable fileUrl with warning (with CVE ID at least) and at least
special flag to enable it.

IMHO, even better would be to require two flags (something like
`--enable-dangerous-features/--enable-unsecure-features` plus actual
`--enable-fileurl` like Sun/Oracle use for commercial features). It will
force user to think twice before start tika-server with fileUrl enabled and
clearly state that server is running in unsecure mode for anyone looking in
ps/htop/initscript/et cetera.

ср, 14 сент. 2016 г. в 17:15, Chris Mattmann <[email protected]>:

> As long as we have a switch and a warning (and pointer to CVE URL with that
> warning), I’m +1 to re-enable it.
>
> On 9/14/16, 4:40 AM, "Nick Burch" <[email protected]> wrote:
>
>     On Wed, 14 Sep 2016, Allison, Timothy B. wrote:
>     > Would it be as much of a disaster to require the user to allow the
>     > fileUrl capability on the commandline at server startup?  We could
> add
>     > some menacing "all bets are off, we hope you know what you're doing"
>     > warning.
>
>     With a special switch, and a warning, enabling file:/// again wouldn't
> be
>     too bad in my view.
>
>     I'm not sure about arbitrary URLs though - there's the security + dos
>     stuff, plus the fact that we won't be doing robots checking / niceness
> /
>     etc. For anyone doing remote URLs, I think they do need to be using a
>     proper + safe + server-friendly crawler, then passing the result of a
>     successful fetch to the Tika server
>
>     >> My main concern in accessing the Tika libraries via TikaJAXRS is the
>     >> performance overheads associated ?>with going through sockets (and
>     >> possible the additional memory/file copying of file data if fileUrl
> is
>     >> not >available).
>     >
>     > In my experience, depending on the file types, y, there's definitely
>     > some overhead, but the bottleneck is in the parsers (esp for complex
>     > document formats -- msoffice, pdf, etc), not data sloshing.
>
>     I agree - for almost all formats, the slow bit isn't byte shuffling
> it's
>     parsing
>
>     Nick
>
>
>
>
> --

Best regards,
Konstantin Gribov

Reply via email to