Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaJAXRS" page has been changed by HaydenYoung: https://wiki.apache.org/tika/TikaJAXRS?action=diff&rev1=17&rev2=18 = Extracting A Document From A URL = - It is possible to use remote files with TikaJAXRS by downloading it via its URL first then piping it to the appropriate service: + It is possible to use a remote file with TikaJAXRS by downloading it via its URL first then piping it to the appropriate service: {{{ - curl "http://url/to/my.file" | curl -X PUT -T - http://localhost:9998/meta + $ curl -s "http://url/to/my.file" | curl -X PUT -T - http://localhost:9998/meta - curl "http://url/to/my.file" | curl -X PUT -T - http://localhost:9998/tika + $ curl -s "http://url/to/my.file" | curl -X PUT -T - http://localhost:9998/tika }}} The caveat with above is that it fetches the entire file, so large files such as video can take some time to download. With services such as "meta" it may be faster to extract a remote file's header first using cURL: {{{ - curl -I http://url/to/my.file + $ curl -I http://url/to/my.file }}} - If the file's contents is suitable for extraction (E.g. it is a PDF, word processing document or some other text file), send it on to TikaJAXRS: + If the file's content is suitable for extraction (E.g. content type is a PDF, word processing document or some other text file), send it on to TikaJAXRS: {{{ - curl "http://url/to/my.file" | curl -X PUT -T - http://localhost:9998/tika + $ curl -s "http://url/to/my.file" | curl -X PUT -T - http://localhost:9998/tika }}} + While the output of cURL's header information is not as cleanly formatted as TikaJAXRS's "meta" service, performance may outweigh this drawback.
