Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaJAXRS" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/TikaJAXRS?action=diff&rev1=45&rev2=46

  }}}
  List all the available parsers, along with what mimetypes they support
  
+ == Specifying a URL Instead of Putting Bytes ==
+ In Tika 1.10, we removed this capability because it posed a security 
vulnerability (CVE-2015-3271).  Anyone with
+ access to the service had the server's access rights; someone could request 
local files via {{{file:///}}} or pages
+ from an intranet that they might not otherwise have access to.
+ 
+ In Tika 1.14, we added the capability back, but the user has to acknowledge 
the security risk by including two commandline arguments:
+ {{{
+ $ java -jar tika-server-x.x.jar -enableUnsecureFeatures -enableFileUrl
+ }}}
+ 
+ This allows the user to specify a {{{fileUrl}}} in the header:
+ {{{
+ curl -i -H "fileUrl:http://tika.apache.org"; -H "Accept:text/plain" -X PUT 
http://localhost:9998/tika
+ }}}
+ 
+ or
+ 
+ {{{
+ curl -i -H "fileUrl:file:///C:/data/my_test_doc.pdf" -H "Accept:text/plain" 
-X PUT http://localhost:9998/tika
+ }}}
+ 
+ By adding back this capability, we did not remove the security vulnerability. 
 Rather, if a user is confident that only authorized clients are able to submit 
a request, the user can choose to operate tika-server with this insecure 
setting. '''BE CAREFUL!'''
+ 
+ Also, please be polite.  This feature was added as a convenience.  Please 
consider using a robust crawler (instead of our simple 
{{{TikaInputStream.get(new URL(fileUrl))}}}) that will allow for better 
configuration of redirects, timeouts, cookies, etc.; and a robust crawler will 
respect robots.txt!
+ 

Reply via email to