Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaJAXRS" page has been changed by HaydenYoung: https://wiki.apache.org/tika/TikaJAXRS?action=diff&rev1=18&rev2=19 Comment: Got ahead of myself re: extracting useful information using curl as some metadata is missing. Text is stored in {{{__TEXT__}}} file, metadata cvs in {{{__METADATA__}}}. Use "accept" header if you want TAR output. = Extracting A Document From A URL = + It is possible to use a remote file with TikaJAXRS by downloading it via its URL first then piping it to the appropriate service: - It is possible to use a remote file with TikaJAXRS by downloading it via its URL first then piping it to the appropriate service: {{{ $ curl -s "http://url/to/my.file" | curl -X PUT -T - http://localhost:9998/meta $ curl -s "http://url/to/my.file" | curl -X PUT -T - http://localhost:9998/tika }}} + The caveat with above is that it fetches the entire file, so large files such as video can take some time to download. Therefore, you may wish to use curl to get preliminary information (content type, name and size) about the file before you proceed: - The caveat with above is that it fetches the entire file, so large files such as video can take some time to download. With services such as "meta" it may be faster to extract a remote file's header first using cURL: {{{ $ curl -I http://url/to/my.file }}} + If the file should be parsed (E.g. you only want to get information about mp3s, mp4s and PDFs), send it on to TikaJAXRS. - If the file's content is suitable for extraction (E.g. content type is a PDF, word processing document or some other text file), send it on to TikaJAXRS: - {{{ - $ curl -s "http://url/to/my.file" | curl -X PUT -T - http://localhost:9998/tika - }}} - While the output of cURL's header information is not as cleanly formatted as TikaJAXRS's "meta" service, performance may outweigh this drawback.
