Author: nick
Date: Wed Aug 19 14:56:18 2015
New Revision: 1696608

URL: http://svn.apache.org/r1696608
Log:
Republish site

Added:
    tika/site/publish/1.11/
    tika/site/publish/1.11/configuring.html
    tika/site/publish/1.11/examples.html
    tika/site/publish/1.11/formats.html
Modified:
    tika/site/publish/1.10/configuring.html

Modified: tika/site/publish/1.10/configuring.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/configuring.html?rev=1696608&r1=1696607&r2=1696608&view=diff
==============================================================================
--- tika/site/publish/1.10/configuring.html (original)
+++ tika/site/publish/1.10/configuring.html Wed Aug 19 14:56:18 2015
@@ -96,18 +96,23 @@
 <li><a href="#Configuring_Mime_Types">Configuring Mime Types</a></li>
 <li><a href="#Configuring_Language_Identifiers">Configuring Language 
Identifiers</a></li>
 <li><a href="#Configuring_Translators">Configuring Translators</a></li>
-<li><a href="#Using_a_Tika_Configuration_XML_file">Using a Tika Configuration 
XML file</a></li></ul></li></ul>
+<li><a href="#Configuring_the_Service_Loader">Configuring the Service 
Loader</a></li></ul></li></ul>
 <div class="section">
-<h3><a name="Configuring_Parsers">Configuring Parsers</a></h3><!-- TODO Add 
more on in 1.10, which has more support -->
-<p>In Tika 1.9, there is some support for configuring Parsers in the Tika 
Config xml. You can provide a custom list of parser to use, in a custom order, 
and you can also force certain mimetypes to be used or not-used for parsers. 
You can do so with Tika Config something like:</p>
+<h3><a name="Configuring_Parsers">Configuring Parsers</a></h3>
+<p>Through the Tika Config xml, it is possible to have a high degree of 
control over which parsers are or aren't used, in what order of preferences 
etc. It is also possible to override just certain parts, to (for example) have 
&quot;default except for PDF&quot;.</p>
+<p>Currently, it is only possible to have a single parser run against a 
document. There is on-going discussion around fallback parsers and combining 
the output of multiple parsers running on a document, but none of these are 
available yet.</p>
+<p>To override some parser certain default behaviours, include the <a 
href="#DefaultParser"></a> in your configuration, with excludes, then add other 
parser definitions in. To prevent the <a href="#DefaultParser"></a> (with its 
auto-discovery) being used, simply omit it from your config, and list all other 
parsers you want instead.</p>
+<p>To override just some default behaviour, you can use a Tika Config 
something like this:</p>
 <div>
 <pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
 &lt;properties&gt;
   &lt;parsers&gt;
-    &lt;!-- Default Parser for most things, except for 2 mime types --&gt;
+    &lt;!-- Default Parser for most things, except for 2 mime types, and never
+         use the Executable Parser --&gt;
     &lt;parser class=&quot;org.apache.tika.parser.DefaultParser&quot;&gt;
       &lt;mime-exclude&gt;image/jpeg&lt;/mime-exclude&gt;
       &lt;mime-exclude&gt;application/pdf&lt;/mime-exclude&gt;
+      &lt;parser-exclude 
class=&quot;org.apache.tika.parser.executable.ExecutableParser&quot;/&gt;
     &lt;/parser&gt;
     &lt;!-- Use a different parser for PDF --&gt;
     &lt;parser class=&quot;org.apache.tika.parser.EmptyParser&quot;&gt;
@@ -115,10 +120,24 @@
     &lt;/parser&gt;
   &lt;/parsers&gt;
 &lt;/properties&gt;</pre></div>
-<p>In code, the key classes to use to build up your own custom parser 
heirarchy are <a 
href="./api/org/apache/tika/parser/DefaultParser.html">org.apache.tika.parser.DefaultParser</a>,
 <a 
href="./api/org/apache/tika/parser/CompositeParser.html">org.apache.tika.parser.CompositeParser</a>
 and <a 
href="./api/org/apache/tika/parser/ParserDecorator.html">org.apache.tika.parser.ParserDecorator</a>.</p></div>
+<p>To configure things in code, the key classes to use to build up your own 
custom parser heirarchy are <a 
href="./api/org/apache/tika/parser/DefaultParser.html">org.apache.tika.parser.DefaultParser</a>,
 <a 
href="./api/org/apache/tika/parser/CompositeParser.html">org.apache.tika.parser.CompositeParser</a>
 and <a 
href="./api/org/apache/tika/parser/ParserDecorator.html">org.apache.tika.parser.ParserDecorator</a>.</p></div>
 <div class="section">
-<h3><a name="Configuring_Detectors">Configuring Detectors</a></h3><!-- TODO 
Add more on in 1.10, which has more support -->
-<p>In Tika 1.9, there is limited support for configuring Detectors in the Tika 
Config xml. You can provide a custom list of detectors to use, in a custom 
order, with Tika Config something like:</p>
+<h3><a name="Configuring_Detectors">Configuring Detectors</a></h3>
+<p>Through the Tika Config xml, it is possible to have a high degree of 
control over which detectors are or aren't used, in what order of preferences 
etc. It is also possible to override just certain parts, to (for example) have 
&quot;default except for no POIFS Container Detction&quot;.</p>
+<p>To override some detector certain default behaviours, include the <a 
href="#DefaultDetector"></a>, with any <a href="#detector-exclude"></a> entries 
you need, in your configuration, then add other detectors definitions in. To 
prevent the <a href="#DefaultParser"></a> (with its auto-discovery) being used, 
simply omit it from your config, and list all other detectors you want 
instead.</p>
+<p>To override just some default behaviour, you can use a Tika Config 
something like this:</p>
+<div>
+<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
+&lt;properties&gt;
+  &lt;detectors&gt;
+    &lt;!-- All detectors except built-in container ones --&gt;
+    &lt;detector class=&quot;org.apache.tika.detect.DefaultDetector&quot;&gt;
+      &lt;detector-exclude 
class=&quot;org.apache.tika.parser.pkg.ZipContainerDetector&quot;/&gt;
+      &lt;detector-exclude 
class=&quot;org.apache.tika.parser.microsoft.POIFSContainerDetector&quot;/&gt;
+    &lt;/detector&gt;
+  &lt;/detectors&gt;
+&lt;/properties&gt;</pre></div>
+<p>Or to just only use certain detectors, you can use a Tika Config something 
like this:</p>
 <div>
 <pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
 &lt;properties&gt;
@@ -139,6 +158,21 @@
 <h3><a name="Configuring_Translators">Configuring Translators</a></h3>
 <p>At this time, there is no unified way to configure Translators. While the 
work on that is ongoing, for now you will need to review the <a 
href="./api/">Tika Javadocs</a> to see how individual Translators are 
configured.</p></div>
 <div class="section">
+<h3><a name="Configuring_the_Service_Loader">Configuring the Service 
Loader</a></h3>
+<p>Tika has a number of service provider types such as parsers, detectors, and 
translators. The <a 
href="./api/org/apache/tika/config/ServiceLoader.html">org.apache.tika.config.ServiceLoader</a>
 class provides a registry of each type of provider. This allows Tika to create 
implementations such as <a 
href="./api/org/apache/tika/parser/DefaultParser.html">org.apache.tika.parser.DefaultParser</a>,
 <a 
href="./api/org/apache/tika/language/translate/DefaultTranslator.html">org.apache.tika.language.translate.DefaultTranslator</a>,
 and <a 
href="./api/org/apache/tika/detect/DefaultDetector.html">org.apache.tika.detect.DefaultDetector</a>
 that can match the appropriate provider to an incoming piece of content.</p>
+<p>The ServiceLoader's registry can be populated either statically or 
dynamically.</p>
+<p>Static Static loading is the default which requires no configuration. This 
configuration options is used in Tika deployments where the Tika JAR files 
reside together in the same classloader hierarchy. The services provides are 
loaded from provider configuration files located within the tika-parsers JAR 
file at META-INF/services.</p>
+<p>Dynamic Dynamic loading may be required if the tika service providers will 
reside in different classloaders such as in OSGi. To allow a provider created 
in tika-config.xml to utilize dynamically loaded services you need to configure 
the ServiceLoader to be dynamic with the following configuration:</p>
+<div>
+<pre>&lt;properties&gt;
+  &lt;service-loader dynamic=&quot;true&quot;/&gt;
+  ....
+&lt;/properties&gt;</pre></div>
+<p>The ServiceLoader can contains a handler to deal with errors that occur 
during provider initialization. For example if a class fails to initialize 
LoadErrorHandler deals with the exception that is thrown. This handler can be 
configured to:</p>
+<p>IGNORE - (Default) Do nothing when providers fail to initialize. WARN - Log 
a warning when providers fail to initialize. THROW - Throw an exception when 
providers fail to initialize.</p></div></div>
+<div class="section">
+<h2>For example to set the LoadErrorHandler to WARN then use the following 
configuration: --- <i>properties</i> <i>service-loader 
loadErrorHandler=&quot;WARN&quot;/</i> .... <i>/properties</i> ---<a 
name="For_example_to_set_the_LoadErrorHandler_to_WARN_then_use_the_following_configuration:_---_properties_service-loader_loadErrorHandlerWARN_...._properties_---"></a></h2><!--
 When Translators can have their parameters configured, mention here about 
--><!-- specifying which single one to use in the Tika Config XML -->
+<div class="section">
 <h3><a name="Using_a_Tika_Configuration_XML_file">Using a Tika Configuration 
XML file</a></h3>
 <p>However you call Tika, the System Property of <tt> tika.config </tt> is 
checked first, and the Environment Variable of <tt> TIKA_CONFIG </tt> is tried 
next. Setting one of those will cause Tika to use your given Tika Config XML 
file.</p>
 <p>If you are calling Tika from your own code, then you can pass in the 
location of your Tika Config XML file when you construct your 
<tt>TikaConfig</tt> instance. From that, you can fetch your configured parser, 
detectors etc.</p>

Added: tika/site/publish/1.11/configuring.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.11/configuring.html?rev=1696608&view=auto
==============================================================================
--- tika/site/publish/1.11/configuring.html (added)
+++ tika/site/publish/1.11/configuring.html Wed Aug 19 14:56:18 2015
@@ -0,0 +1,442 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
+
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+
+
+
+
+
+
+<html xmlns="http://www.w3.org/1999/xhtml";>
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+    <title>Apache Tika - Configuring Tika</title>
+    <style type="text/css" media="all">
+      @import url("../css/site.css");
+    </style>
+    <link rel="icon" type="image/png" href="../tikaNoText16.png" />
+    <script type="text/javascript">
+      function selectProvider(form) {
+        provider = form.elements['searchProvider'].value;
+        if (provider == "any") {
+          if (Math.random() > 0.5) {
+            provider = "lucid";
+          } else {
+            provider = "sl";
+          }
+        }
+        if (provider == "lucid") {
+          form.action = "http://find.searchhub.org/p:tika";;
+        } else if (provider == "sl") {
+          form.action = "http://search-lucene.com/tika";;
+        }
+        days = 90;
+        date = new Date();
+        date.setTime(date.getTime() + (days * 24 * 60 * 60 * 1000));
+        expires = "; expires=" + date.toGMTString();
+        document.cookie = "searchProvider=" + provider + expires + "; path=/";
+      }
+      function initProvider() {
+        if (document.cookie.length>0) {
+          cStart=document.cookie.indexOf("searchProvider=");
+          if (cStart!=-1) {
+            cStart=cStart + "searchProvider=".length;
+            cEnd=document.cookie.indexOf(";", cStart);
+            if (cEnd==-1) {
+              cEnd=document.cookie.length;
+            }
+            provider = unescape(document.cookie.substring(cStart,cEnd));
+            document.forms['searchform'].elements['searchProvider'].value = 
provider;
+          }
+        }
+        document.forms['searchform'].elements['q'].focus();
+      }
+    </script>
+  </head>
+  <body onLoad="initProvider();">
+    <div id="body">
+      <div id="banner">
+        <a href="http://tika.apache.org"; id="bannerLeft" title="Apache Tika"
+          ><img src="http://tika.apache.org/tika.png"; alt="Apache Tika"
+                width="292" height="100"/></a>
+        <a href="http://www.apache.org/"; id="bannerRight"
+           title="The Apache Software Foundation"
+          ><img src="http://tika.apache.org/asf-logo.gif"; alt="The Apache 
Software Foundation"
+                width="387" height="100"/></a>
+      </div>
+      <div id="content">
+        <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
+<h2>Configuring Tika<a name="Configuring_Tika"></a></h2>
+<p>Out of the box, Apache Tika will attempt to start with all available 
Detectors and Parsers, running with sensible defaults. For most users, this 
default configuration will work well.</p>
+<p>This page gives you information on how to configure the various components 
of Apache Tika, such as Parsers and Detectors, if you need fine-grained control 
over ordering, exclusions and the like.</p>
+<ul>
+<li><a href="#Configuring_Tika">Configuring Tika</a>
+<ul>
+<li><a href="#Configuring_Parsers">Configuring Parsers</a></li>
+<li><a href="#Configuring_Detectors">Configuring Detectors</a></li>
+<li><a href="#Configuring_Mime_Types">Configuring Mime Types</a></li>
+<li><a href="#Configuring_Language_Identifiers">Configuring Language 
Identifiers</a></li>
+<li><a href="#Configuring_Translators">Configuring Translators</a></li>
+<li><a href="#Configuring_the_Service_Loader">Configuring the Service 
Loader</a></li></ul></li></ul>
+<div class="section">
+<h3><a name="Configuring_Parsers">Configuring Parsers</a></h3>
+<p>Through the Tika Config xml, it is possible to have a high degree of 
control over which parsers are or aren't used, in what order of preferences 
etc. It is also possible to override just certain parts, to (for example) have 
&quot;default except for PDF&quot;.</p>
+<p>Currently, it is only possible to have a single parser run against a 
document. There is on-going discussion around fallback parsers and combining 
the output of multiple parsers running on a document, but none of these are 
available yet.</p>
+<p>To override some parser certain default behaviours, include the <a 
href="#DefaultParser"></a> in your configuration, with excludes, then add other 
parser definitions in. To prevent the <a href="#DefaultParser"></a> (with its 
auto-discovery) being used, simply omit it from your config, and list all other 
parsers you want instead.</p>
+<p>To override just some default behaviour, you can use a Tika Config 
something like this:</p>
+<div>
+<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
+&lt;properties&gt;
+  &lt;parsers&gt;
+    &lt;!-- Default Parser for most things, except for 2 mime types, and never
+         use the Executable Parser --&gt;
+    &lt;parser class=&quot;org.apache.tika.parser.DefaultParser&quot;&gt;
+      &lt;mime-exclude&gt;image/jpeg&lt;/mime-exclude&gt;
+      &lt;mime-exclude&gt;application/pdf&lt;/mime-exclude&gt;
+      &lt;parser-exclude 
class=&quot;org.apache.tika.parser.executable.ExecutableParser&quot;/&gt;
+    &lt;/parser&gt;
+    &lt;!-- Use a different parser for PDF --&gt;
+    &lt;parser class=&quot;org.apache.tika.parser.EmptyParser&quot;&gt;
+      &lt;mime&gt;application/pdf&lt;/mime&gt;
+    &lt;/parser&gt;
+  &lt;/parsers&gt;
+&lt;/properties&gt;</pre></div>
+<p>To configure things in code, the key classes to use to build up your own 
custom parser heirarchy are <a 
href="./api/org/apache/tika/parser/DefaultParser.html">org.apache.tika.parser.DefaultParser</a>,
 <a 
href="./api/org/apache/tika/parser/CompositeParser.html">org.apache.tika.parser.CompositeParser</a>
 and <a 
href="./api/org/apache/tika/parser/ParserDecorator.html">org.apache.tika.parser.ParserDecorator</a>.</p></div>
+<div class="section">
+<h3><a name="Configuring_Detectors">Configuring Detectors</a></h3>
+<p>Through the Tika Config xml, it is possible to have a high degree of 
control over which detectors are or aren't used, in what order of preferences 
etc. It is also possible to override just certain parts, to (for example) have 
&quot;default except for no POIFS Container Detction&quot;.</p>
+<p>To override some detector certain default behaviours, include the <a 
href="#DefaultDetector"></a>, with any <a href="#detector-exclude"></a> entries 
you need, in your configuration, then add other detectors definitions in. To 
prevent the <a href="#DefaultParser"></a> (with its auto-discovery) being used, 
simply omit it from your config, and list all other detectors you want 
instead.</p>
+<p>To override just some default behaviour, you can use a Tika Config 
something like this:</p>
+<div>
+<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
+&lt;properties&gt;
+  &lt;detectors&gt;
+    &lt;!-- All detectors except built-in container ones --&gt;
+    &lt;detector class=&quot;org.apache.tika.detect.DefaultDetector&quot;&gt;
+      &lt;detector-exclude 
class=&quot;org.apache.tika.parser.pkg.ZipContainerDetector&quot;/&gt;
+      &lt;detector-exclude 
class=&quot;org.apache.tika.parser.microsoft.POIFSContainerDetector&quot;/&gt;
+    &lt;/detector&gt;
+  &lt;/detectors&gt;
+&lt;/properties&gt;</pre></div>
+<p>Or to just only use certain detectors, you can use a Tika Config something 
like this:</p>
+<div>
+<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
+&lt;properties&gt;
+  &lt;detectors&gt;
+    &lt;!-- Only use these two detectors, and ignore all others --&gt;
+    &lt;detector 
class=&quot;org.apache.tika.parser.pkg.ZipContainerDetector&quot;/&gt;
+    &lt;detector class=&quot;org.apache.tika.mime.MimeTypes&quot;/&gt;
+  &lt;/detectors&gt;
+&lt;/properties&gt;</pre></div>
+<p>In code, the key classes to use to build up your own custom detector 
heirarchy are <a 
href="./api/org/apache/tika/detect/DefaultDetector.html">org.apache.tika.detect.DefaultDetector</a>
 and <a 
href="./api/org/apache/tika/detect/CompositeDetector.html">org.apache.tika.detect.CompositeDetector</a>.</p></div>
+<div class="section">
+<h3><a name="Configuring_Mime_Types">Configuring Mime Types</a></h3>
+<p>TODO Mention non-standard paths, and custom mime type files</p></div>
+<div class="section">
+<h3><a name="Configuring_Language_Identifiers">Configuring Language 
Identifiers</a></h3>
+<p>At this time, there is no unified way to configure language identifiers. 
While the work on that is ongoing, for now you will need to review the <a 
href="./api/">Tika Javadocs</a> to see how individual identifiers are 
configured.</p></div>
+<div class="section">
+<h3><a name="Configuring_Translators">Configuring Translators</a></h3>
+<p>At this time, there is no unified way to configure Translators. While the 
work on that is ongoing, for now you will need to review the <a 
href="./api/">Tika Javadocs</a> to see how individual Translators are 
configured.</p></div>
+<div class="section">
+<h3><a name="Configuring_the_Service_Loader">Configuring the Service 
Loader</a></h3>
+<p>Tika has a number of service provider types such as parsers, detectors, and 
translators. The <a 
href="./api/org/apache/tika/config/ServiceLoader.html">org.apache.tika.config.ServiceLoader</a>
 class provides a registry of each type of provider. This allows Tika to create 
implementations such as <a 
href="./api/org/apache/tika/parser/DefaultParser.html">org.apache.tika.parser.DefaultParser</a>,
 <a 
href="./api/org/apache/tika/language/translate/DefaultTranslator.html">org.apache.tika.language.translate.DefaultTranslator</a>,
 and <a 
href="./api/org/apache/tika/detect/DefaultDetector.html">org.apache.tika.detect.DefaultDetector</a>
 that can match the appropriate provider to an incoming piece of content.</p>
+<p>The ServiceLoader's registry can be populated either statically or 
dynamically.</p>
+<p>Static Static loading is the default which requires no configuration. This 
configuration options is used in Tika deployments where the Tika JAR files 
reside together in the same classloader hierarchy. The services provides are 
loaded from provider configuration files located within the tika-parsers JAR 
file at META-INF/services.</p>
+<p>Dynamic Dynamic loading may be required if the tika service providers will 
reside in different classloaders such as in OSGi. To allow a provider created 
in tika-config.xml to utilize dynamically loaded services you need to configure 
the ServiceLoader to be dynamic with the following configuration:</p>
+<div>
+<pre>&lt;properties&gt;
+  &lt;service-loader dynamic=&quot;true&quot;/&gt;
+  ....
+&lt;/properties&gt;</pre></div>
+<p>The ServiceLoader can contains a handler to deal with errors that occur 
during provider initialization. For example if a class fails to initialize 
LoadErrorHandler deals with the exception that is thrown. This handler can be 
configured to:</p>
+<p>IGNORE - (Default) Do nothing when providers fail to initialize. WARN - Log 
a warning when providers fail to initialize. THROW - Throw an exception when 
providers fail to initialize.</p></div></div>
+<div class="section">
+<h2>For example to set the LoadErrorHandler to WARN then use the following 
configuration: --- <i>properties</i> <i>service-loader 
loadErrorHandler=&quot;WARN&quot;/</i> .... <i>/properties</i> ---<a 
name="For_example_to_set_the_LoadErrorHandler_to_WARN_then_use_the_following_configuration:_---_properties_service-loader_loadErrorHandlerWARN_...._properties_---"></a></h2><!--
 When Translators can have their parameters configured, mention here about 
--><!-- specifying which single one to use in the Tika Config XML -->
+<div class="section">
+<h3><a name="Using_a_Tika_Configuration_XML_file">Using a Tika Configuration 
XML file</a></h3>
+<p>However you call Tika, the System Property of <tt> tika.config </tt> is 
checked first, and the Environment Variable of <tt> TIKA_CONFIG </tt> is tried 
next. Setting one of those will cause Tika to use your given Tika Config XML 
file.</p>
+<p>If you are calling Tika from your own code, then you can pass in the 
location of your Tika Config XML file when you construct your 
<tt>TikaConfig</tt> instance. From that, you can fetch your configured parser, 
detectors etc.</p>
+<div>
+<pre>TikaConfig config = new TikaConfig(&quot;/path/to/tika-config.xml&quot;);
+Detector detector = config.getDetector();
+Parser autoDetectParser = new AutoDetectParser(config);</pre></div>
+<p>For users of the Tika App, in addition to the sytem property and the 
environement variable, you can also use the <tt> --config=[tika-config.xml] 
</tt> option to select a different Tika Config XML file to use</p>
+<p>For users of the Tika Server, in addition to the sytem property and the 
environement variable, you can also use <tt> -c [tika-config.xml] </tt> or <tt> 
--config [tika-config.xml] </tt> options to select a different Tika Config XML 
file to use</p></div></div>
+      </div>
+      <div id="sidebar">
+        <div id="navigation">
+                    <h5>Apache Tika</h5>
+            <ul>
+              
+    <li class="none">
+                    <a href="../index.html">Introduction</a>
+          </li>
+              
+    <li class="none">
+                    <a href="../download.html">Download</a>
+          </li>
+              
+    <li class="none">
+                    <a href="../contribute.html">Contribute</a>
+          </li>
+              
+    <li class="none">
+                    <a href="../mail-lists.html">Mailing Lists</a>
+          </li>
+              
+    <li class="none">
+                    <a href="http://wiki.apache.org/tika/"; 
class="externalLink">Tika Wiki</a>
+          </li>
+              
+    <li class="none">
+                    <a href="https://issues.apache.org/jira/browse/TIKA"; 
class="externalLink">Issue Tracker</a>
+          </li>
+          </ul>
+              <h5>Documentation</h5>
+            <ul>
+              
+          
+                    
+                  
+                  
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="expanded">
+                    <a href="../1.10/index.html">Apache Tika 1.10</a>
+                  <ul>
+                  
+    <li class="none">
+                    <a href="../1.10/gettingstarted.html">Getting Started</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/formats.html">Supported Formats</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/parser.html">Parser API</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/parser_guide.html">Parser 5min Quick 
Start Guide</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/detection.html">Content and Language 
Detection</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/configuring.html">Configuring Tika</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/examples.html">Usage Examples</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/api/">API Documentation</a>
+          </li>
+              </ul>
+        </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.9/index.html">Apache Tika 1.9</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.8/index.html">Apache Tika 1.8</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.7/index.html">Apache Tika 1.7</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.6/index.html">Apache Tika 1.6</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.5/index.html">Apache Tika 1.5</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.4/index.html">Apache Tika 1.4</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.3/index.html">Apache Tika 1.3</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.2/index.html">Apache Tika 1.2</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.1/index.html">Apache Tika 1.1</a>
+                </li>
+          </ul>
+              <h5>The Apache Software Foundation</h5>
+            <ul>
+              
+    <li class="none">
+                    <a href="http://www.apache.org/foundation/"; 
class="externalLink">About</a>
+          </li>
+              
+    <li class="none">
+                    <a href="http://www.apache.org/licenses/"; 
class="externalLink">License</a>
+          </li>
+              
+    <li class="none">
+                    <a href="http://www.apache.org/security/"; 
class="externalLink">Security</a>
+          </li>
+              
+    <li class="none">
+                    <a 
href="http://www.apache.org/foundation/sponsorship.html"; 
class="externalLink">Sponsorship</a>
+          </li>
+              
+    <li class="none">
+                    <a href="http://www.apache.org/foundation/thanks.html"; 
class="externalLink">Thanks</a>
+          </li>
+          </ul>
+      
+          <div id="search">
+            <h5>Search with Apache Solr</h5>
+            <form action="http://search.lucidimagination.com/p:tika";
+                  method="get" id="searchform">
+              <input type="text" id="query" name="q"/>
+              <select name="searchProvider" id="searchProvider">
+                <option value="any">provider</option>
+                <option value="lucid">Lucid Find</option>
+                <option value="sl">Search-Lucene</option>
+              </select>
+              <input type="submit" id="submit" value="Search" name="Search"
+                     onclick="selectProvider(this.form)"/>
+            </form>
+          </div>
+
+          <div id="bookpromo">
+            <h5>Books about Tika</h5>
+            <p>
+              <a href="http://manning.com/mattmann/"; title="Tika in Action"
+                ><img src="../mattmann_cover150.jpg"
+                      width="150" height="186"/></a>
+            </p>
+          </div>
+        </div>
+      </div>
+      <div id="footer">
+        <p>
+          Copyright &#169; 2015
+          <a href="http://www.apache.org/";>The Apache Software Foundation</a>.
+          Site powered by <a href="http://maven.apache.org/";>Apache Maven</a>. 
+          Search powered by
+          <a href="http://www.lucidimagination.com";>Lucid Imagination</a>
+          and <a href="http://sematext.com";>Sematext</a>.
+          <br/>
+          Apache Tika, Tika, Apache, the Apache feather logo, and the Apache
+          Tika project logo are trademarks of The Apache Software Foundation.
+        </p>
+      </div>
+    </div>
+  </body>
+</html>

Added: tika/site/publish/1.11/examples.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.11/examples.html?rev=1696608&view=auto
==============================================================================
--- tika/site/publish/1.11/examples.html (added)
+++ tika/site/publish/1.11/examples.html Wed Aug 19 14:56:18 2015
@@ -0,0 +1,414 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
+
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+
+
+
+
+
+
+<html xmlns="http://www.w3.org/1999/xhtml";>
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+    <title>Apache Tika - Tika API Usage Examples</title>
+    <style type="text/css" media="all">
+      @import url("../css/site.css");
+    </style>
+    <link rel="icon" type="image/png" href="../tikaNoText16.png" />
+    <script type="text/javascript">
+      function selectProvider(form) {
+        provider = form.elements['searchProvider'].value;
+        if (provider == "any") {
+          if (Math.random() > 0.5) {
+            provider = "lucid";
+          } else {
+            provider = "sl";
+          }
+        }
+        if (provider == "lucid") {
+          form.action = "http://find.searchhub.org/p:tika";;
+        } else if (provider == "sl") {
+          form.action = "http://search-lucene.com/tika";;
+        }
+        days = 90;
+        date = new Date();
+        date.setTime(date.getTime() + (days * 24 * 60 * 60 * 1000));
+        expires = "; expires=" + date.toGMTString();
+        document.cookie = "searchProvider=" + provider + expires + "; path=/";
+      }
+      function initProvider() {
+        if (document.cookie.length>0) {
+          cStart=document.cookie.indexOf("searchProvider=");
+          if (cStart!=-1) {
+            cStart=cStart + "searchProvider=".length;
+            cEnd=document.cookie.indexOf(";", cStart);
+            if (cEnd==-1) {
+              cEnd=document.cookie.length;
+            }
+            provider = unescape(document.cookie.substring(cStart,cEnd));
+            document.forms['searchform'].elements['searchProvider'].value = 
provider;
+          }
+        }
+        document.forms['searchform'].elements['q'].focus();
+      }
+    </script>
+  </head>
+  <body onLoad="initProvider();">
+    <div id="body">
+      <div id="banner">
+        <a href="http://tika.apache.org"; id="bannerLeft" title="Apache Tika"
+          ><img src="http://tika.apache.org/tika.png"; alt="Apache Tika"
+                width="292" height="100"/></a>
+        <a href="http://www.apache.org/"; id="bannerRight"
+           title="The Apache Software Foundation"
+          ><img src="http://tika.apache.org/asf-logo.gif"; alt="The Apache 
Software Foundation"
+                width="387" height="100"/></a>
+      </div>
+      <div id="content">
+        <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
+<h2>Apache Tika API Usage Examples<a 
name="Apache_Tika_API_Usage_Examples"></a></h2>
+<p>This page provides a number of examples on how to use the various Tika 
APIs. All of the examples shown are also available in the <a 
class="externalLink" 
href="https://svn.apache.org/repos/asf/tika/trunk/tika-example";>Tika Example 
module</a> in SVN.</p>
+<ul>
+<li><a href="#Apache_Tika_API_Usage_Examples">Apache Tika API Usage 
Examples</a>
+<ul>
+<li><a href="#Parsing">Parsing</a>
+<ul>
+<li><a href="#Parsing_using_the_Tika_Facade">Parsing using the Tika 
Facade</a></li>
+<li><a href="#Parsing_using_the_Auto-Detect_Parser">Parsing using the 
Auto-Detect Parser</a></li></ul></li>
+<li><a href="#Picking_different_output_formats">Picking different output 
formats</a>
+<ul>
+<li><a href="#Parsing_to_Plain_Text">Parsing to Plain Text</a></li>
+<li><a href="#Parsing_to_XHTML">Parsing to XHTML</a></li>
+<li><a href="#Fetching_just_certain_bits_of_the_XHTML">Fetching just certain 
bits of the XHTML</a></li></ul></li>
+<li><a href="#Custom_Content_Handlers">Custom Content Handlers</a>
+<ul>
+<li><a href="#Extract_Phone_Numbers_from_Content_into_the_Metadata">Extract 
Phone Numbers from Content into the Metadata</a></li>
+<li><a href="#Streaming_the_plain_text_in_chunks">Streaming the plain text in 
chunks</a></li></ul></li>
+<li><a href="#Translation">Translation</a>
+<ul>
+<li><a href="#Translation_using_the_Microsoft_Translation_API">Translation 
using the Microsoft Translation API</a></li></ul></li>
+<li><a href="#Language_Identification">Language Identification</a></li>
+<li><a href="#Additional_Examples">Additional Examples</a></li></ul></li></ul>
+<div class="section">
+<h3><a name="Parsing">Parsing</a></h3>
+<p>Tika provides a number of different ways to parse a file. These provide 
different levels of control, flexibility, and complexity.</p>
+<div class="section">
+<h4><a name="Parsing_using_the_Tika_Facade">Parsing using the Tika 
Facade</a></h4>
+<p>The <a href="./api/org/apache/tika/Tika.html">Tika facade</a>, provides a 
number of very quick and easy ways to have your content parsed by Tika, and 
return the resulting plain text</p><style type="text/css">
+   @import url('attached-includes/css/shCoreDefault.css');
+</style>
+<div id="highlighter_38850" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number53 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">String parseToStringExample() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number54 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ParsingExample.</code><code class="java keyword">class</code><code 
class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">);</code></div><div 
class="line number55 index2 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Tika tika = 
</code><code class="java keyword">new</code> <code class="java 
plain">Tika();</code></div><d
 iv class="line number56 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number57 index4 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">tika.parseToString(stream);</code></div><div class="line number58 index5 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number59 index6 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number60 index7 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number61 index8 alt2"><code 
class="java plain">}</code></div></d
 iv></td></tr></tbody></table></div></div>
+<div class="section">
+<h4><a name="Parsing_using_the_Auto-Detect_Parser">Parsing using the 
Auto-Detect Parser</a></h4>
+<p>For more control, you can call the <a 
href="./api/org/apache/tika/parser/Parser.html">Tika Parsers</a> directly. Most 
likely, you'll want to start out using the <a 
href="./api/org/apache/tika/parser/AutoDetectParser.html">Auto-Detect 
Parser</a>, which automatically figures out what kind of content you have, then 
calls the appropriate parser for you.</p><div id="highlighter_728625" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number87 index0 alt2"><code class="java keyword">public</code> 
<code class="java plain">String parseExample() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number88 index1 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">InputStream stream = ParsingExample.</code><code class="java 
keyword">class</code><code clas
 s="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">);</code></div><div 
class="line number89 index2 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number90 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number91 index4 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number92 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code>
  <code class="java plain">{</code></div><div class="line number93 index6 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number94 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number95 index8 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number96 index9 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number97 
index10 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line numb
 er98 index11 alt1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<div class="section">
+<h3><a name="Picking_different_output_formats">Picking different output 
formats</a></h3>
+<p>With Tika, you can get the textual content of your files returned in a 
number of different formats. These can be plain text, html, xhtml, xhtml of one 
part of the file etc. This is controlled based on the <a class="externalLink" 
href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html";>ContentHandler</a>
 you supply to the Parser.</p>
+<div class="section">
+<h4><a name="Parsing_to_Plain_Text">Parsing to Plain Text</a></h4>
+<p>By using the <a 
href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>,
 you can request that Tika return only the content of the document's body as a 
plain-text string.</p><div id="highlighter_341951" class="syntaxhighlighter 
nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number46 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseToPlainText() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number47 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number48 index2 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div 
 class="line number49 index3 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test.doc"</code><code class="java 
plain">);</code></div><div class="line number50 index4 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number51 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number52 index6 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">{</code></div
 ><div class="line number53 index7 alt2"><code class="java 
 >spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
 >class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
 >class="line number54 index8 alt1"><code class="java 
 >spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
 >class="java keyword">return</code> <code class="java 
 >plain">handler.toString();</code></div><div class="line number55 index9 
 >alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
 >class="java plain">} </code><code class="java keyword">finally</code> <code 
 >class="java plain">{</code></div><div class="line number56 index10 
 >alt1"><code class="java 
 >spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
 >class="java plain">stream.close();</code></div><div class="line number57 
 >index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
 >class="java plain">}</code></div><div class="line number58 index12 
 >alt1"><code class="java p
 lain">}</code></div></div></td></tr></tbody></table></div></div>
+<div class="section">
+<h4><a name="Parsing_to_XHTML">Parsing to XHTML</a></h4>
+<p>By using the <a 
href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>,
 you can get the XHTML content of the whole document as a string.</p><div 
id="highlighter_823606" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number63 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">String parseToHTML() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number64 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler();</code></div><div class="line number65 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div 
class="line number66 index3 alt1"><cod
 e class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">InputStream stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test.doc"</code><code class="java 
plain">);</code></div><div class="line number67 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number68 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number69 index6 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">{</code></div><div class="line 
number70 index7 alt1
 "><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number71 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number72 index9 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number73 index10 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number74 
index11 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number75 index12 alt2"><code 
class="java plain">}</code></div></div></td></tr></
 tbody></table></div>
+<p>If you just want the body of the xhtml document, without the header, you 
can chain together a <a 
href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a> 
and a <a 
href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>
 as shown:</p><div id="highlighter_274521" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number81 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
parseBodyToHTML() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number82 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">BodyContentHandler(</code></div><div class="line number83 index2 alt
 2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler());</code></div><div class="line number84 index3 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div 
class="line number85 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test.doc"</code><code class="java 
plain">);</code></div><div class="line number86 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number87 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbs
 p;</code><code class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number88 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number89 index8 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number90 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number91 index10 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number92 index11 alt1"><code 
cl
 ass="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number93 
index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number94 index13 alt1"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<div class="section">
+<h4><a name="Fetching_just_certain_bits_of_the_XHTML">Fetching just certain 
bits of the XHTML</a></h4>
+<p>It possible to execute XPath queries on the parse results, to fetch only 
certain bits of the XHTML. </p><div id="highlighter_862878" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number100 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseOnePartToHTML() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number101 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
comments">// Only get things under html -> body -> div 
(class=header)</code></div><div class="line number102 index2 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">XPathParser xhtmlParser = </code><code class="java keyword">new</code> 
<code class="java plain">XPathParser(</code><code class="java strin
 g">"xhtml"</code><code class="java plain">, 
XHTMLContentHandler.XHTML);</code></div><div class="line number103 index3 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Matcher divContentMatcher = 
xhtmlParser.parse(</code></div><div class="line number104 index4 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java 
string">"/xhtml:html/xhtml:body/xhtml:div/descendant::node()"</code><code 
class="java plain">);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
</code></div><div class="line number105 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">MatchingContentHandler(</code></div><div class="line number106 index6 
alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java ke
 yword">new</code> <code class="java plain">ToXMLContentHandler(), 
divContentMatcher);</code></div><div class="line number107 index7 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div class="line 
number108 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test2.doc"</code><code class="java 
plain">);</code></div><div class="line number109 index9 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number110 index10 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <cod
 e class="java plain">Metadata();</code></div><div class="line number111 
index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">try</code> <code class="java plain">{</code></div><div 
class="line number112 index12 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number113 index13 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number114 index14 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number115 index15 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plai
 n">stream.close();</code></div><div class="line number116 index16 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number117 index17 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<div class="section">
+<h3><a name="Custom_Content_Handlers">Custom Content Handlers</a></h3>
+<p>The textual output of parsing a file with Tika is returned via the SAX <a 
class="externalLink" 
href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html";>ContentHandler</a>
 you pass to the parse method. It is possible to customise your parsing by 
supplying your own ContentHandler which does special things.</p>
+<div class="section">
+<h4><a name="Extract_Phone_Numbers_from_Content_into_the_Metadata">Extract 
Phone Numbers from Content into the Metadata</a></h4>
+<p>By using the <a 
href="./api/org/apache/tika/sax/PhoneExtractingContentHandler.html">PhoneExtractingContentHandler</a>,
 you can have any phone numbers found in the textual content of the document 
extracted and placed into the Metadata object for you.</p><div 
id="highlighter_608010" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number69 index0 alt2"><code class="java 
keyword">public</code> <code class="java keyword">static</code> <code 
class="java keyword">void</code> <code class="java plain">process(File file) 
</code><code class="java keyword">throws</code> <code class="java 
plain">Exception {</code></div><div class="line number70 index1 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">Parser parser = </code><code class="java keyword">new</code> <code 
class="java plain">AutoDetectParser();</code></div><div class="line number71 
inde
 x2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number72 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// The 
PhoneExtractingContentHandler will examine any characters for phone numbers 
before passing them</code></div><div class="line number73 index4 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
comments">// to the underlying Handler.</code></div><div class="line number74 
index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">PhoneExtractingContentHandler handler = </code><code 
class="java keyword">new</code> <code class="java 
plain">PhoneExtractingContentHandler(</code><code class="java 
keyword">new</code> <code class="java plain">BodyContentHandler(), 
metadata);</code></div><div class=
 "line number75 index6 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = </code><code class="java keyword">new</code> <code class="java 
plain">FileInputStream(file);</code></div><div class="line number76 index7 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">try</code> <code class="java plain">{</code></div><div 
class="line number77 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata, </code><code 
class="java keyword">new</code> <code class="java 
plain">ParseContext());</code></div><div class="line number78 index9 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number79 index10 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">finally</code> <code class="java plain">{</co
 de></div><div class="line number80 index11 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number81 
index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number82 index13 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">String[] numbers = metadata.getValues(</code><code class="java 
string">"phonenumbers"</code><code class="java plain">);</code></div><div 
class="line number83 index14 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">for</code> 
<code class="java plain">(String number : numbers) {</code></div><div 
class="line number84 index15 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">phoneNumbers.add(number);</code></div><div class="line 
number85 index16 alt2">
 <code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number86 index17 alt1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div>
+<div class="section">
+<h4><a name="Streaming_the_plain_text_in_chunks">Streaming the plain text in 
chunks</a></h4>
+<p>Sometimes, you want to chunk the resulting text up, perhaps to output as 
you go minimising memory use, perhaps to output to HDFS files, or any other 
reason! With a small custom content handler, you can do that.</p><div 
id="highlighter_722747" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number124 index0 alt1"><code class="java 
keyword">public</code> <code class="java plain">List&lt;String> 
parseToPlainTextChunks() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number125 index1 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">final</code> 
<code class="java plain">List&lt;String> chunks = </code><code class="java 
keyword">new</code> <code class="java 
plain">ArrayList&lt;String>();</code></div><div class="line number126 index2 
alt1"><
 code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">chunks.add(</code><code class="java string">""</code><code class="java 
plain">);</code></div><div class="line number127 index3 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">ContentHandlerDecorator handler = </code><code class="java 
keyword">new</code> <code class="java plain">ContentHandlerDecorator() 
{</code></div><div class="line number128 index4 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java color1">@Override</code></div><div class="line number129 index5 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">public</code> <code class="java keyword">void</code> <code 
class="java plain">characters(</code><code class="java 
keyword">char</code><code class="java plain">[] ch, </code><code class="java 
keyword">int</code> <code class="java plain">star
 t, </code><code class="java keyword">int</code> <code class="java 
plain">length) {</code></div><div class="line number130 index6 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String lastChunk = chunks.get(chunks.size()-</code><code 
class="java value">1</code><code class="java plain">);</code></div><div 
class="line number131 index7 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String thisStr = </code><code class="java 
keyword">new</code> <code class="java plain">String(ch, start, 
length);</code></div><div class="line number132 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div
 class="line number133 index9 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 clas
 s="java keyword">if</code> <code class="java plain">(lastChunk.length()+length 
> MAXIMUM_TEXT_CHUNK_SIZE) {</code></div><div class="line number134 index10 
alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.add(thisStr);</code></div><div class="line number135 
index11 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">} </code><code class="java keyword">else</code> <code 
class="java plain">{</code></div><div class="line number136 index12 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.set(chunks.size()-</code><code class="java 
value">1</code><code class="java plain">, lastChunk+thisStr);</code></div><div 
class="line number137 index13 alt2"><code class="java spaces">&nbsp
 
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number138 index14 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number139 index15 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">};</code></div><div class="line number140 index16 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div class="line 
number141 index17 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test2.doc"</code><code class="java 
plain">);</code></div><div class="line number142 index18 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Aut
 oDetectParser parser = </code><code class="java keyword">new</code> <code 
class="java plain">AutoDetectParser();</code></div><div class="line number143 
index19 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number144 index20 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number145 index21 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number146 index22 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">chunks;</code></div><div class="line number147 index23 alt2"><code class
 ="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} 
</code><code class="java keyword">finally</code> <code class="java 
plain">{</code></div><div class="line number148 index24 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number149 
index25 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number150 index26 alt1"><code 
class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<div class="section">
+<h3><a name="Translation">Translation</a></h3>
+<p>Tika provides a pluggable Translation system, which allow you to send the 
results of parsing off to an external system or program to have the text 
translated into another language.</p>
+<div class="section">
+<h4><a name="Translation_using_the_Microsoft_Translation_API">Translation 
using the Microsoft Translation API</a></h4>
+<p>In order to use the Microsoft Translation API, you need to sign up for a 
Microsoft account, get an API key, then pass the key to Tika before 
translating.</p><div id="highlighter_163565" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
microsoftTranslateToFrench(String text) {</code></div><div class="line number24 
index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">MicrosoftTranslator translator = </code><code class="java 
keyword">new</code> <code class="java 
plain">MicrosoftTranslator();</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java comments">// Change the id and secret! See <a 
href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.";>http://msdn.microso
 ft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line number26 
index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">translator.setId(</code><code class="java 
string">"dummy-id"</code><code class="java plain">);</code></div><div 
class="line number27 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">translator.setSecret(</code><code class="java 
string">"dummy-secret"</code><code class="java plain">);</code></div><div 
class="line number28 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number29 index6 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">translator.translate(text, </code><code class="java 
string">"fr"</code><code class="java plain">);</code></div><div class=
 "line number30 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code 
class="java keyword">catch</code> <code class="java plain">(Exception e) 
{</code></div><div class="line number31 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java string">"Error while 
translating."</code><code class="java plain">;</code></div><div class="line 
number32 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number33 index10 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<div class="section">
+<h3><a name="Language_Identification">Language Identification</a></h3>
+<p>Tika provides support for identifying the language of text, through the <a 
href="./api/org/apache/tika/language/LanguageIdentifier.html">LanguageIdentifier</a>
 class.</p><div id="highlighter_75347" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
identifyLanguage(String text) {</code></div><div class="line number24 index1 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">LanguageIdentifier identifier = </code><code class="java 
keyword">new</code> <code class="java 
plain">LanguageIdentifier(text);</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">identifier.getLanguage();</code></div><div class="line number26 index3 
alt1
 "><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div>
+<div class="section">
+<h3><a name="Additional_Examples">Additional Examples</a></h3>
+<p>A number of other examples are also available, including all of the 
examples from the <a class="externalLink" 
href="http://manning.com/mattmann/";>Tika In Action book</a>. These can all be 
found in the <a class="externalLink" 
href="https://svn.apache.org/repos/asf/tika/trunk/tika-example";>Tika Example 
module</a> in SVN.</p></div></div>
+      </div>
+      <div id="sidebar">
+        <div id="navigation">
+                    <h5>Apache Tika</h5>
+            <ul>
+              
+    <li class="none">
+                    <a href="../index.html">Introduction</a>
+          </li>
+              
+    <li class="none">
+                    <a href="../download.html">Download</a>
+          </li>
+              
+    <li class="none">
+                    <a href="../contribute.html">Contribute</a>
+          </li>
+              
+    <li class="none">
+                    <a href="../mail-lists.html">Mailing Lists</a>
+          </li>
+              
+    <li class="none">
+                    <a href="http://wiki.apache.org/tika/"; 
class="externalLink">Tika Wiki</a>
+          </li>
+              
+    <li class="none">
+                    <a href="https://issues.apache.org/jira/browse/TIKA"; 
class="externalLink">Issue Tracker</a>
+          </li>
+          </ul>
+              <h5>Documentation</h5>
+            <ul>
+              
+          
+                    
+                  
+                  
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="expanded">
+                    <a href="../1.10/index.html">Apache Tika 1.10</a>
+                  <ul>
+                  
+    <li class="none">
+                    <a href="../1.10/gettingstarted.html">Getting Started</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/formats.html">Supported Formats</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/parser.html">Parser API</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/parser_guide.html">Parser 5min Quick 
Start Guide</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/detection.html">Content and Language 
Detection</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/configuring.html">Configuring Tika</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/examples.html">Usage Examples</a>
+          </li>
+                  
+    <li class="none">
+                    <a href="../1.10/api/">API Documentation</a>
+          </li>
+              </ul>
+        </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.9/index.html">Apache Tika 1.9</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.8/index.html">Apache Tika 1.8</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.7/index.html">Apache Tika 1.7</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.6/index.html">Apache Tika 1.6</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.5/index.html">Apache Tika 1.5</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.4/index.html">Apache Tika 1.4</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.3/index.html">Apache Tika 1.3</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.2/index.html">Apache Tika 1.2</a>
+                </li>
+              
+                
+                    
+                  
+                  
+                  
+                  
+                  
+              
+        <li class="collapsed">
+                    <a href="../1.1/index.html">Apache Tika 1.1</a>
+                </li>
+          </ul>
+              <h5>The Apache Software Foundation</h5>
+            <ul>
+              
+    <li class="none">
+                    <a href="http://www.apache.org/foundation/"; 
class="externalLink">About</a>
+          </li>
+              
+    <li class="none">
+                    <a href="http://www.apache.org/licenses/"; 
class="externalLink">License</a>
+          </li>
+              
+    <li class="none">
+                    <a href="http://www.apache.org/security/"; 
class="externalLink">Security</a>
+          </li>
+              
+    <li class="none">
+                    <a 
href="http://www.apache.org/foundation/sponsorship.html"; 
class="externalLink">Sponsorship</a>
+          </li>
+              
+    <li class="none">
+                    <a href="http://www.apache.org/foundation/thanks.html"; 
class="externalLink">Thanks</a>
+          </li>
+          </ul>
+      
+          <div id="search">
+            <h5>Search with Apache Solr</h5>
+            <form action="http://search.lucidimagination.com/p:tika";
+                  method="get" id="searchform">
+              <input type="text" id="query" name="q"/>
+              <select name="searchProvider" id="searchProvider">
+                <option value="any">provider</option>
+                <option value="lucid">Lucid Find</option>
+                <option value="sl">Search-Lucene</option>
+              </select>
+              <input type="submit" id="submit" value="Search" name="Search"
+                     onclick="selectProvider(this.form)"/>
+            </form>
+          </div>
+
+          <div id="bookpromo">
+            <h5>Books about Tika</h5>
+            <p>
+              <a href="http://manning.com/mattmann/"; title="Tika in Action"
+                ><img src="../mattmann_cover150.jpg"
+                      width="150" height="186"/></a>
+            </p>
+          </div>
+        </div>
+      </div>
+      <div id="footer">
+        <p>
+          Copyright &#169; 2015
+          <a href="http://www.apache.org/";>The Apache Software Foundation</a>.
+          Site powered by <a href="http://maven.apache.org/";>Apache Maven</a>. 
+          Search powered by
+          <a href="http://www.lucidimagination.com";>Lucid Imagination</a>
+          and <a href="http://sematext.com";>Sematext</a>.
+          <br/>
+          Apache Tika, Tika, Apache, the Apache feather logo, and the Apache
+          Tika project logo are trademarks of The Apache Software Foundation.
+        </p>
+      </div>
+    </div>
+  </body>
+</html>


Reply via email to