Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaJAXRS" page has been changed by TimothyAllison: https://wiki.apache.org/tika/TikaJAXRS?action=diff&rev1=45&rev2=46 }}} List all the available parsers, along with what mimetypes they support + == Specifying a URL Instead of Putting Bytes == + In Tika 1.10, we removed this capability because it posed a security vulnerability (CVE-2015-3271). Anyone with + access to the service had the server's access rights; someone could request local files via {{{file:///}}} or pages + from an intranet that they might not otherwise have access to. + + In Tika 1.14, we added the capability back, but the user has to acknowledge the security risk by including two commandline arguments: + {{{ + $ java -jar tika-server-x.x.jar -enableUnsecureFeatures -enableFileUrl + }}} + + This allows the user to specify a {{{fileUrl}}} in the header: + {{{ + curl -i -H "fileUrl:http://tika.apache.org" -H "Accept:text/plain" -X PUT http://localhost:9998/tika + }}} + + or + + {{{ + curl -i -H "fileUrl:file:///C:/data/my_test_doc.pdf" -H "Accept:text/plain" -X PUT http://localhost:9998/tika + }}} + + By adding back this capability, we did not remove the security vulnerability. Rather, if a user is confident that only authorized clients are able to submit a request, the user can choose to operate tika-server with this insecure setting. '''BE CAREFUL!''' + + Also, please be polite. This feature was added as a convenience. Please consider using a robust crawler (instead of our simple {{{TikaInputStream.get(new URL(fileUrl))}}}) that will allow for better configuration of redirects, timeouts, cookies, etc.; and a robust crawler will respect robots.txt! +
