Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaJAXRS" page has been changed by HaydenYoung:
https://wiki.apache.org/tika/TikaJAXRS?action=diff&rev1=37&rev2=38

Comment:
Add documentation about new fileUrl header option for extracting remote files.

  {{{
  java -jar tika-server-x.x.jar --host=intranet.local --port=12345
  }}}
- 
  Once the server is running, you can visit the server's URL in your browser 
(eg {{{http://localhost:9998/}}}), and the basic welcome page will confirm that 
the Server is running, and give links to the various endpoints available.
  
  Below is some basic documentation on how to  interact with the services using 
cURL and HTTP.
  
  == Using prebuilt Docker image ==
- Also, you can download and start it with 
+ Also, you can download and start it with
  
  {{{
  docker pull logicalspark/docker-tikaserver # only on initial download/update
@@ -97, +96 @@

  "Content-Encoding","ISO-8859-2"
  "Content-Type","text/plain"
  }}}
- 
  Get metadata as JSON:
+ 
  {{{
  $ curl -T test_recursive_embedded.docx http://localhost:9998/meta --header 
"Accept: application/json"
  }}}
- 
  Or XMP:
  
  {{{
  $ curl -T test_recursive_embedded.docx http://localhost:9998/meta --header 
"Accept: application/rdf+xml"
  }}}
- 
- 
  Get specific metadata key's value as simple text string:
+ 
  {{{
  $ curl -T test_recursive_embedded.docx 
http://localhost:9998/meta/Content-Type --header "Accept: text/plain"
  }}}
- 
  Returns:
+ 
  {{{
  application/vnd.openxmlformats-officedocument.wordprocessingml.document
  }}}
- 
- 
  Get specific metadata key's value(s) as CSV:
+ 
  {{{
  $ curl -T test_recursive_embedded.docx 
http://localhost:9998/meta/Content-Type --header "Accept: text/csv"
  }}}
- 
  Or JSON:
+ 
  {{{
  $ curl -T test_recursive_embedded.docx 
http://localhost:9998/meta/Content-Type --header "Accept: application/json"
  }}}
- 
  Or XMP:
+ 
  {{{
  $ curl -T test_recursive_embedded.docx 
http://localhost:9998/meta/Content-Type --header "Accept: application/rdf+xml"
  }}}
- 
  '''Note: when requesting specific metadata keys value(s) in XMP, make sure to 
request the XMP name, e.g. "dc:creator" vs. "Author" '''
  
  == Tika Resource ==
@@ -179, +174 @@

  {{{
  $ curl -X PUT -H "Content-Disposition: attachment; filename=foo.csv" 
--upload-file foo.csv http://localhost:9998/detect/stream
  }}}
- 
  == Language Resource ==
  {{{
  /language/stream
  }}}
- HTTP PUTs or POSTs a document to the LanguageIdentifier to identify its 
language. 
+ HTTP PUTs or POSTs a document to the LanguageIdentifier to identify its 
language.
  
  Default return is a string of the 2 character identified language.
  
@@ -195, +189 @@

  $ curl -X PUT --data-binary @foo.txt http://localhost:9998/language/stream
  en
  }}}
- 
  == PUT a TXT file with French comme çi comme ça and get back fr ==
  {{{
  curl -X PUT --data-binary @foo.txt http://localhost:9998/language/stream
  fr
  }}}
- 
  {{{
  /language/string
  }}}
@@ -216, +208 @@

  $ curl -X PUT --data "This is English!" http://localhost:9998/language/string
  en
  }}}
- 
  == PUT a string with French comme çi comme ça and get back fr ==
  {{{
  curl -X PUT --data "comme çi comme ça" http://localhost:9998/language/string
  fr
  }}}
- 
  == Translate Resource ==
  {{{
  /translate/all/translator/src/dest
@@ -231, +221 @@

  
  Default return is the translated string if successful, else the original 
string back.
  
+ Note that: * *translator* should be a fully qualified Tika class name (with 
package) e.g., org.apache.tika.language.translate.Lingo24Translator * *src* 
should be the 2 character short code for the source language, e.g., 'en' for 
English * *dest* should be the 2 character short code for the dest language, 
e.g., 'es' for Spanish.
- Note that:
- * *translator* should be a fully qualified Tika class name (with package) 
e.g., org.apache.tika.language.translate.Lingo24Translator
- * *src* should be the 2 character short code for the source language, e.g., 
'en' for English
- * *dest* should be the 2 character short code for the dest language, e.g., 
'es' for Spanish.
  
  Some Example calls with cURL:
  
@@ -243, +230 @@

  $ curl -X PUT --data-binary @sentences 
http://localhost:9998/translate/all/org.apache.tika.language.translate.Lingo24Translator/es/en
  lack of practice in Spanish
  }}}
- 
  == PUT a TXT file named sentences with Spanish me falta práctica en Español 
and get back the English translation using Microsoft ==
  {{{
  $ curl -X PUT --data-binary @sentences 
http://localhost:9998/translate/all/org.apache.tika.language.translate.MicrosoftTranslator/es/en
  I need practice in Spanish
  }}}
- 
  == PUT a TXT file named sentences with Spanish me falta práctica en Español 
and get back the English translation using Google ==
  {{{
  $ curl -X PUT --data-binary @sentences 
http://localhost:9998/translate/all/org.apache.tika.language.translate.GoogleTranslator/es/en
  I need practice in Spanish
  }}}
- 
  {{{
  /translate/all/src/dest
  }}}
@@ -263, +247 @@

  
  Default return is the translated string if successful, else the original 
string back.
  
+ Note that: * *translator* should be a fully qualified Tika class name (with 
package) e.g., org.apache.tika.language.translate.Lingo24Translator * *dest* 
should be the 2 character short code for the dest language, e.g., 'es' for 
Spanish.
- Note that:
- * *translator* should be a fully qualified Tika class name (with package) 
e.g., org.apache.tika.language.translate.Lingo24Translator
- * *dest* should be the 2 character short code for the dest language, e.g., 
'es' for Spanish.
  
  == PUT a TXT file named sentences2 with French comme çi comme ça and get back 
the English translation using Google auto-detecting the language ==
  {{{
  $ curl -X PUT --data-binary @sentences2 
http://localhost:9998/translate/all/org.apache.tika.language.translate.GoogleTranslator/en
  so so
  }}}
- 
  == Recursive Metadata and Content ==
  {{{
  /rmeta
  }}}
+ Returns a JSONified list of Metadata objects for the container document and 
all embedded documents. The text that is extracted from each document is stored 
in the metadata object under "X-TIKA:content".
- 
- Returns a JSONified list of Metadata objects for the container document and 
all embedded documents.
- The text that is extracted from each document is stored in the metadata 
object under "X-TIKA:content".
  
  {{{
  $ curl -T test_recursive_embedded.docx http://localhost:9998/rmeta
  }}}
- 
  Returns:
+ 
  {{{
  [
   {"Application-Name":"Microsoft Office Word",
@@ -335, +314 @@

  /
  }}}
  Hitting the route of the server in your web browser will give a basic report 
of all the endpoints defined in the server, what URL they have etc
+ 
  == Defined Mime Types ==
  {{{
  /mime-types
@@ -359, +339 @@

  List all the available parsers, along with what mimetypes they support
  
  = Extracting A Document From A URL =
- It is possible to use a remote file with TikaJAXRS by downloading it via its 
URL first then piping it to the appropriate service:
+ Remote files can be PUT to Tika Server using the header "fileUrl":
  
  {{{
- $ curl -s "http://url/to/my.file"; | curl -X PUT -T - 
http://localhost:9998/meta
+ $ curl -i -H "fileUrl:http://url/to/my.file"; -H "Accept: application/json" -X 
PUT http://localhost:9998/meta
- $ curl -s "http://url/to/my.file"; | curl -X PUT -T - 
http://localhost:9998/tika
+ $ curl -i -H "fileUrl:http://url/to/my.file"; -H "Accept: text/plain" -X PUT 
http://localhost:9998/tika
  }}}
- The caveat with above is that it fetches the entire file, so large files such 
as video can take some time to download. Therefore, you may wish to use curl to 
get preliminary information (content type, name and size) about the file before 
you proceed:
+ NOTE: Each PUT will download the entire file from the remote source.
  
- {{{
- $ curl -I http://url/to/my.file
- }}}
- If the file should be parsed (E.g. you only want to get information about 
mp3s, mp4s and PDFs), send it on to TikaJAXRS.
- 

Reply via email to