1.6...

mattmann Sun, 25 Oct 2015 21:41:40 -0700

Modified: tika/site/publish/1.1/parser_guide.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.1/parser_guide.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.1/parser_guide.html (original)
+++ tika/site/publish/1.1/parser_guide.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - Get Tika parsing up and running in 5 minutes</title>
+    <title>Apache Tika &#x2013; Get Tika parsing up and running in 5 
minutes</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,7 +85,7 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>Get Tika parsing up and running in 5 minutes<a 
name="Get_Tika_parsing_up_and_running_in_5_minutes"></a></h2>
+<h2><a name="Get_Tika_parsing_up_and_running_in_5_minutes"></a>Get Tika 
parsing up and running in 5 minutes</h2>
 <p>This page is a quick start guide showing how to add a new parser to Apache 
Tika. Following the simple steps listed below your new parser can be running in 
only 5 minutes.</p>
 <ul>
 <li><a href="#Get_Tika_parsing_up_and_running_in_5_minutes">Get Tika parsing 
up and running in 5 minutes</a>


Modified: tika/site/publish/1.10/configuring.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/configuring.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.10/configuring.html (original)
+++ tika/site/publish/1.10/configuring.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - Configuring Tika</title>
+    <title>Apache Tika &#x2013; Configuring Tika</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,7 +85,7 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>Configuring Tika<a name="Configuring_Tika"></a></h2>
+<h2><a name="Configuring_Tika"></a>Configuring Tika</h2>
 <p>Out of the box, Apache Tika will attempt to start with all available 
Detectors and Parsers, running with sensible defaults. For most users, this 
default configuration will work well.</p>
 <p>This page gives you information on how to configure the various components 
of Apache Tika, such as Parsers and Detectors, if you need fine-grained control 
over ordering, exclusions and the like.</p>
 <ul>
@@ -167,10 +167,10 @@
 <p>Tika has a number of service provider types such as parsers, detectors, and 
translators. The <a 
href="./api/org/apache/tika/config/ServiceLoader.html">org.apache.tika.config.ServiceLoader</a>
 class provides a registry of each type of provider. This allows Tika to create 
implementations such as <a 
href="./api/org/apache/tika/parser/DefaultParser.html">org.apache.tika.parser.DefaultParser</a>,
 <a 
href="./api/org/apache/tika/language/translate/DefaultTranslator.html">org.apache.tika.language.translate.DefaultTranslator</a>,
 and <a 
href="./api/org/apache/tika/detect/DefaultDetector.html">org.apache.tika.detect.DefaultDetector</a>
 that can match the appropriate provider to an incoming piece of content.</p>
 <p>The ServiceLoader's registry can be populated either statically or 
dynamically.</p>
 <div class="section">
-<h4>Static<a name="Static"></a></h4>
+<h4><a name="Static"></a>Static</h4>
 <p>Static loading is the default which requires no configuration. This 
configuration options is used in Tika deployments where the Tika JAR files 
reside together in the same classloader hierarchy. The services provides are 
loaded from provider configuration files located within the tika-parsers JAR 
file at META-INF/services.</p></div>
 <div class="section">
-<h4>Dynamic<a name="Dynamic"></a></h4>
+<h4><a name="Dynamic"></a>Dynamic</h4>
 <p>Dynamic loading may be required if the tika service providers will reside 
in different classloaders such as in OSGi. To allow a provider created in 
tika-config.xml to utilize dynamically loaded services you need to configure 
the ServiceLoader to be dynamic with the following configuration:</p>
 <div>
 <pre>&lt;properties&gt;
@@ -178,7 +178,7 @@
   ....
 &lt;/properties&gt;</pre></div></div>
 <div class="section">
-<h4>Load Error Handling<a name="Load_Error_Handling"></a></h4>
+<h4><a name="Load_Error_Handling"></a>Load Error Handling</h4>
 <p>The ServiceLoader can contains a handler to deal with errors that occur 
during provider initialization. For example if a class fails to initialize 
LoadErrorHandler deals with the exception that is thrown. This handler can be 
configured to:</p>
 <ul>
 <li><tt> IGNORE </tt> - (Default) Do nothing when providers fail to 
initialize.</li>

Modified: tika/site/publish/1.10/detection.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/detection.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.10/detection.html (original)
+++ tika/site/publish/1.10/detection.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - Content Detection</title>
+    <title>Apache Tika &#x2013; Content Detection</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,7 +85,7 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>Content Detection<a name="Content_Detection"></a></h2>
+<h2><a name="Content_Detection"></a>Content Detection</h2>
 <p>This page gives you information on how content and language detection works 
with Apache Tika, and how to tune the behaviour of Tika.</p>
 <ul>
 <li><a href="#Content_Detection">Content Detection</a>

Modified: tika/site/publish/1.10/examples.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/examples.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.10/examples.html (original)
+++ tika/site/publish/1.10/examples.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - Tika API Usage Examples</title>
+    <title>Apache Tika &#x2013; Tika API Usage Examples</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,7 +85,7 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>Apache Tika API Usage Examples<a 
name="Apache_Tika_API_Usage_Examples"></a></h2>
+<h2><a name="Apache_Tika_API_Usage_Examples"></a>Apache Tika API Usage 
Examples</h2>
 <p>This page provides a number of examples on how to use the various Tika 
APIs. All of the examples shown are also available in the <a 
class="externalLink" 
href="https://svn.apache.org/repos/asf/tika/trunk/tika-example";>Tika Example 
module</a> in SVN.</p>
 <ul>
 <li><a href="#Apache_Tika_API_Usage_Examples">Apache Tika API Usage 
Examples</a>
@@ -116,23 +116,23 @@
 <p>The <a href="./api/org/apache/tika/Tika.html">Tika facade</a>, provides a 
number of very quick and easy ways to have your content parsed by Tika, and 
return the resulting plain text</p><style type="text/css">
    @import url('attached-includes/css/shCoreDefault.css');
 </style>
-<div id="highlighter_565683" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number54 index0 alt1"><code class="java 
keyword">public</code> <code class="java plain">String parseToStringExample() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number55 index1 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Tika tika = 
</code><code class="java keyword">new</code> <code class="java 
plain">Tika();</code></div><div class="line number56 index2 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">(InputStream stream = 
ParsingExample.</code><code class="java keyword">class</code><code class="java 
plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><c
 ode class="java plain">)) {</code></div><div class="line number57 index3 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">tika.parseToString(stream);</code></div><div class="line number58 index4 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number59 index5 alt2"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<div id="highlighter_54641" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number54 index0 alt1"><code class="java 
keyword">public</code> <code class="java plain">String parseToStringExample() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number55 index1 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Tika tika = 
</code><code class="java keyword">new</code> <code class="java 
plain">Tika();</code></div><div class="line number56 index2 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">(InputStream stream = 
ParsingExample.</code><code class="java keyword">class</code><code class="java 
plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><co
 de class="java plain">)) {</code></div><div class="line number57 index3 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">tika.parseToString(stream);</code></div><div class="line number58 index4 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number59 index5 alt2"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Parsing_using_the_Auto-Detect_Parser">Parsing using the 
Auto-Detect Parser</a></h4>
-<p>For more control, you can call the <a 
href="./api/org/apache/tika/parser/Parser.html">Tika Parsers</a> directly. Most 
likely, you'll want to start out using the <a 
href="./api/org/apache/tika/parser/AutoDetectParser.html">Auto-Detect 
Parser</a>, which automatically figures out what kind of content you have, then 
calls the appropriate parser for you.</p><div id="highlighter_370328" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number85 index0 alt2"><code class="java keyword">public</code> 
<code class="java plain">String parseExample() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number86 index1 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java pla
 in">AutoDetectParser();</code></div><div class="line number87 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number88 index3 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number89 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">(InputStream stream = ParsingExample.</code><code 
class="java keyword">class</code><code class="java 
plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">)) {</code></div><div 
class="line number90 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nb
 sp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, 
handler, metadata);</code></div><div class="line number91 index6 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number92 index7 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number93 index8 alt2"><code 
class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>For more control, you can call the <a 
href="./api/org/apache/tika/parser/Parser.html">Tika Parsers</a> directly. Most 
likely, you'll want to start out using the <a 
href="./api/org/apache/tika/parser/AutoDetectParser.html">Auto-Detect 
Parser</a>, which automatically figures out what kind of content you have, then 
calls the appropriate parser for you.</p><div id="highlighter_86950" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number85 index0 alt2"><code class="java keyword">public</code> 
<code class="java plain">String parseExample() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number86 index1 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plai
 n">AutoDetectParser();</code></div><div class="line number87 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number88 index3 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number89 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">(InputStream stream = ParsingExample.</code><code 
class="java keyword">class</code><code class="java 
plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">)) {</code></div><div 
class="line number90 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbs
 p;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, 
handler, metadata);</code></div><div class="line number91 index6 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number92 index7 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number93 index8 alt2"><code 
class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Picking_different_output_formats">Picking different output 
formats</a></h3>
 <p>With Tika, you can get the textual content of your files returned in a 
number of different formats. These can be plain text, html, xhtml, xhtml of one 
part of the file etc. This is controlled based on the <a class="externalLink" 
href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html";>ContentHandler</a>
 you supply to the Parser.</p>
 <div class="section">
 <h4><a name="Parsing_to_Plain_Text">Parsing to Plain Text</a></h4>
-<p>By using the <a 
href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>,
 you can request that Tika return only the content of the document's body as a 
plain-text string.</p><div id="highlighter_944301" class="syntaxhighlighter 
nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number47 index0 alt2"><code class="java keyword">public</code> 
<code class="java plain">String parseToPlainText() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number48 index1 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number49 index2 
alt2">&nbsp;</div><div class="line number50 index3 alt1"><code class="java space
 s">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser 
parser = </code><code class="java keyword">new</code> <code class="java 
plain">AutoDetectParser();</code></div><div class="line number51 index4 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number52 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">(InputStream stream = 
ContentHandlerExample.</code><code class="java keyword">class</code><code 
class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">)) {</code></div><div 
class="line number53 index6 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</c
 ode></div><div class="line number54 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number55 index8 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number56 index9 alt1"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>By using the <a 
href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>,
 you can request that Tika return only the content of the document's body as a 
plain-text string.</p><div id="highlighter_660886" class="syntaxhighlighter 
nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number47 index0 alt2"><code class="java keyword">public</code> 
<code class="java plain">String parseToPlainText() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number48 index1 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number49 index2 
alt2">&nbsp;</div><div class="line number50 index3 alt1"><code class="java space
 s">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser 
parser = </code><code class="java keyword">new</code> <code class="java 
plain">AutoDetectParser();</code></div><div class="line number51 index4 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number52 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">(InputStream stream = 
ContentHandlerExample.</code><code class="java keyword">class</code><code 
class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">)) {</code></div><div 
class="line number53 index6 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</c
 ode></div><div class="line number54 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number55 index8 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number56 index9 alt1"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Parsing_to_XHTML">Parsing to XHTML</a></h4>
-<p>By using the <a 
href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>,
 you can get the XHTML content of the whole document as a string.</p><div 
id="highlighter_594636" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number61 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">String parseToHTML() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number62 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler();</code></div><div class="line number63 index2 
alt2">&nbsp;</div><div class="line number64 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><cod
 e class="java plain">AutoDetectParser parser = </code><code class="java 
keyword">new</code> <code class="java 
plain">AutoDetectParser();</code></div><div class="line number65 index4 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number66 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">(InputStream stream = 
ContentHandlerExample.</code><code class="java keyword">class</code><code 
class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">)) {</code></div><div 
class="line number67 index6 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number68 in
 dex7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number69 index8 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number70 index9 alt1"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div>
-<p>If you just want the body of the xhtml document, without the header, you 
can chain together a <a 
href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a> 
and a <a 
href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>
 as shown:</p><div id="highlighter_362151" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number76 index0 
alt1"><code class="java keyword">public</code> <code class="java plain">String 
parseBodyToHTML() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number77 index1 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">BodyContentHandler(</code></div><div class="line number78 index2 alt
 1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler());</code></div><div class="line number79 index3 
alt2">&nbsp;</div><div class="line number80 index4 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number81 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number82 index6 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">(InputStream stream = 
ContentHandlerExample.</code><code class="java keyword">class</code><code 
 class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">)) {</code></div><div 
class="line number83 index7 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number84 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number85 index9 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number86 index10 alt1"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>By using the <a 
href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>,
 you can get the XHTML content of the whole document as a string.</p><div 
id="highlighter_127483" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number61 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">String parseToHTML() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number62 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler();</code></div><div class="line number63 index2 
alt2">&nbsp;</div><div class="line number64 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><cod
 e class="java plain">AutoDetectParser parser = </code><code class="java 
keyword">new</code> <code class="java 
plain">AutoDetectParser();</code></div><div class="line number65 index4 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number66 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">(InputStream stream = 
ContentHandlerExample.</code><code class="java keyword">class</code><code 
class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">)) {</code></div><div 
class="line number67 index6 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number68 in
 dex7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number69 index8 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number70 index9 alt1"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div>
+<p>If you just want the body of the xhtml document, without the header, you 
can chain together a <a 
href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a> 
and a <a 
href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>
 as shown:</p><div id="highlighter_996530" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number76 index0 
alt1"><code class="java keyword">public</code> <code class="java plain">String 
parseBodyToHTML() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number77 index1 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">BodyContentHandler(</code></div><div class="line number78 index2 alt
 1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler());</code></div><div class="line number79 index3 
alt2">&nbsp;</div><div class="line number80 index4 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number81 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number82 index6 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">(InputStream stream = 
ContentHandlerExample.</code><code class="java keyword">class</code><code 
 class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">)) {</code></div><div 
class="line number83 index7 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number84 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number85 index9 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number86 index10 alt1"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Fetching_just_certain_bits_of_the_XHTML">Fetching just certain 
bits of the XHTML</a></h4>
-<p>It possible to execute XPath queries on the parse results, to fetch only 
certain bits of the XHTML. </p><div id="highlighter_667177" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number92 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseOnePartToHTML() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number93 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
comments">// Only get things under html -> body -> div 
(class=header)</code></div><div class="line number94 index2 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">XPathParser xhtmlParser = </code><code class="java keyword">new</code> 
<code class="java plain">XPathParser(</code><code class="java string">
 "xhtml"</code><code class="java plain">, 
XHTMLContentHandler.XHTML);</code></div><div class="line number95 index3 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Matcher divContentMatcher = xhtmlParser.parse(</code><code 
class="java 
string">"/xhtml:html/xhtml:body/xhtml:div/descendant::node()"</code><code 
class="java plain">);</code></div><div class="line number96 index4 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">ContentHandler handler = </code><code class="java keyword">new</code> 
<code class="java plain">MatchingContentHandler(</code></div><div class="line 
number97 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler(), divContentMatcher);</code></div><div class="line 
number98 index6 alt1">&nbsp;</div><div class="line number99 index7 alt2"><code 
class=
 "java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number100 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number101 index9 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">(InputStream stream = 
ContentHandlerExample.</code><code class="java keyword">class</code><code 
class="java plain">.getResourceAsStream(</code><code class="java 
string">"test2.doc"</code><code class="java plain">)) {</code></div><div 
class="line number102 index10 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handle
 r, metadata);</code></div><div class="line number103 index11 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number104 index12 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number105 index13 alt2"><code 
class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>It possible to execute XPath queries on the parse results, to fetch only 
certain bits of the XHTML. </p><div id="highlighter_733596" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number92 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseOnePartToHTML() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number93 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
comments">// Only get things under html -> body -> div 
(class=header)</code></div><div class="line number94 index2 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">XPathParser xhtmlParser = </code><code class="java keyword">new</code> 
<code class="java plain">XPathParser(</code><code class="java string">
 "xhtml"</code><code class="java plain">, 
XHTMLContentHandler.XHTML);</code></div><div class="line number95 index3 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Matcher divContentMatcher = xhtmlParser.parse(</code><code 
class="java 
string">"/xhtml:html/xhtml:body/xhtml:div/descendant::node()"</code><code 
class="java plain">);</code></div><div class="line number96 index4 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">ContentHandler handler = </code><code class="java keyword">new</code> 
<code class="java plain">MatchingContentHandler(</code></div><div class="line 
number97 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler(), divContentMatcher);</code></div><div class="line 
number98 index6 alt1">&nbsp;</div><div class="line number99 index7 alt2"><code 
class=
 "java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number100 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number101 index9 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">(InputStream stream = 
ContentHandlerExample.</code><code class="java keyword">class</code><code 
class="java plain">.getResourceAsStream(</code><code class="java 
string">"test2.doc"</code><code class="java plain">)) {</code></div><div 
class="line number102 index10 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handle
 r, metadata);</code></div><div class="line number103 index11 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number104 index12 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number105 index13 alt2"><code 
class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Custom_Content_Handlers">Custom Content Handlers</a></h3>
 <p>The textual output of parsing a file with Tika is returned via the SAX <a 
class="externalLink" 
href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html";>ContentHandler</a>
 you pass to the parse method. It is possible to customise your parsing by 
supplying your own ContentHandler which does special things.</p>
@@ -141,16 +141,16 @@
 <p>By using the <a 
href="./api/org/apache/tika/sax/PhoneExtractingContentHandler.html">PhoneExtractingContentHandler</a>,
 you can have any phone numbers found in the textual content of the document 
extracted and placed into the Metadata object for you.</p></div>
 <div class="section">
 <h4><a name="Streaming_the_plain_text_in_chunks">Streaming the plain text in 
chunks</a></h4>
-<p>Sometimes, you want to chunk the resulting text up, perhaps to output as 
you go minimising memory use, perhaps to output to HDFS files, or any other 
reason! With a small custom content handler, you can do that.</p><div 
id="highlighter_574435" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number113 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">List&lt;String> 
parseToPlainTextChunks() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number114 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">final</code> 
<code class="java plain">List&lt;String> chunks = </code><code class="java 
keyword">new</code> <code class="java plain">ArrayList&lt;>();</code></div><div 
class="line number115 index2 alt2"><code c
 lass="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">chunks.add(</code><code class="java string">""</code><code class="java 
plain">);</code></div><div class="line number116 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">ContentHandlerDecorator handler = </code><code class="java 
keyword">new</code> <code class="java plain">ContentHandlerDecorator() 
{</code></div><div class="line number117 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java color1">@Override</code></div><div class="line number118 index5 
alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">public</code> <code class="java keyword">void</code> <code 
class="java plain">characters(</code><code class="java 
keyword">char</code><code class="java plain">[] ch, </code><code class="java 
keyword">int</code> <code class="java plain">start, </c
 ode><code class="java keyword">int</code> <code class="java plain">length) 
{</code></div><div class="line number119 index6 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String lastChunk = chunks.get(chunks.size() - </code><code 
class="java value">1</code><code class="java plain">);</code></div><div 
class="line number120 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String thisStr = </code><code class="java 
keyword">new</code> <code class="java plain">String(ch, start, 
length);</code></div><div class="line number121 index8 alt2">&nbsp;</div><div 
class="line number122 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java keyword">if</code> <code class="java plain">(lastChunk.length() + 
length > MAXIMUM_TEXT_CHUNK_SIZE) {
 </code></div><div class="line number123 index10 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.add(thisStr);</code></div><div class="line number124 
index11 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">} </code><code class="java keyword">else</code> <code 
class="java plain">{</code></div><div class="line number125 index12 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.set(chunks.size() - </code><code class="java 
value">1</code><code class="java plain">, lastChunk + 
thisStr);</code></div><div class="line number126 index13 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">}</c
 ode></div><div class="line number127 index14 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number128 index15 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">};</code></div><div class="line number129 index16 alt2">&nbsp;</div><div 
class="line number130 index17 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number131 index18 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number132 index19 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class
 ="java plain">(InputStream stream = ContentHandlerExample.</code><code 
class="java keyword">class</code><code class="java 
plain">.getResourceAsStream(</code><code class="java 
string">"test2.doc"</code><code class="java plain">)) {</code></div><div 
class="line number133 index20 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number134 index21 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">chunks;</code></div><div class="line number135 index22 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number136 index23 alt1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>Sometimes, you want to chunk the resulting text up, perhaps to output as 
you go minimising memory use, perhaps to output to HDFS files, or any other 
reason! With a small custom content handler, you can do that.</p><div 
id="highlighter_412463" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number113 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">List&lt;String> 
parseToPlainTextChunks() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number114 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">final</code> 
<code class="java plain">List&lt;String> chunks = </code><code class="java 
keyword">new</code> <code class="java plain">ArrayList&lt;>();</code></div><div 
class="line number115 index2 alt2"><code c
 lass="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">chunks.add(</code><code class="java string">""</code><code class="java 
plain">);</code></div><div class="line number116 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">ContentHandlerDecorator handler = </code><code class="java 
keyword">new</code> <code class="java plain">ContentHandlerDecorator() 
{</code></div><div class="line number117 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java color1">@Override</code></div><div class="line number118 index5 
alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">public</code> <code class="java keyword">void</code> <code 
class="java plain">characters(</code><code class="java 
keyword">char</code><code class="java plain">[] ch, </code><code class="java 
keyword">int</code> <code class="java plain">start, </c
 ode><code class="java keyword">int</code> <code class="java plain">length) 
{</code></div><div class="line number119 index6 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String lastChunk = chunks.get(chunks.size() - </code><code 
class="java value">1</code><code class="java plain">);</code></div><div 
class="line number120 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String thisStr = </code><code class="java 
keyword">new</code> <code class="java plain">String(ch, start, 
length);</code></div><div class="line number121 index8 alt2">&nbsp;</div><div 
class="line number122 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java keyword">if</code> <code class="java plain">(lastChunk.length() + 
length > MAXIMUM_TEXT_CHUNK_SIZE) {
 </code></div><div class="line number123 index10 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.add(thisStr);</code></div><div class="line number124 
index11 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">} </code><code class="java keyword">else</code> <code 
class="java plain">{</code></div><div class="line number125 index12 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.set(chunks.size() - </code><code class="java 
value">1</code><code class="java plain">, lastChunk + 
thisStr);</code></div><div class="line number126 index13 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">}</c
 ode></div><div class="line number127 index14 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number128 index15 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">};</code></div><div class="line number129 index16 alt2">&nbsp;</div><div 
class="line number130 index17 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number131 index18 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number132 index19 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class
 ="java plain">(InputStream stream = ContentHandlerExample.</code><code 
class="java keyword">class</code><code class="java 
plain">.getResourceAsStream(</code><code class="java 
string">"test2.doc"</code><code class="java plain">)) {</code></div><div 
class="line number133 index20 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number134 index21 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">chunks;</code></div><div class="line number135 index22 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number136 index23 alt1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Translation">Translation</a></h3>
 <p>Tika provides a pluggable Translation system, which allow you to send the 
results of parsing off to an external system or program to have the text 
translated into another language.</p>
 <div class="section">
 <h4><a name="Translation_using_the_Microsoft_Translation_API">Translation 
using the Microsoft Translation API</a></h4>
-<p>In order to use the Microsoft Translation API, you need to sign up for a 
Microsoft account, get an API key, then pass the key to Tika before 
translating.</p><div id="highlighter_281058" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
microsoftTranslateToFrench(String text) {</code></div><div class="line number24 
index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">MicrosoftTranslator translator = </code><code class="java 
keyword">new</code> <code class="java 
plain">MicrosoftTranslator();</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java comments">// Change the id and secret! See <a 
href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.";>http://msdn.microso
 ft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line number26 
index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">translator.setId(</code><code class="java 
string">"dummy-id"</code><code class="java plain">);</code></div><div 
class="line number27 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">translator.setSecret(</code><code class="java 
string">"dummy-secret"</code><code class="java plain">);</code></div><div 
class="line number28 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number29 index6 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">translator.translate(text, </code><code class="java 
string">"fr"</code><code class="java plain">);</code></div><div class=
 "line number30 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code 
class="java keyword">catch</code> <code class="java plain">(Exception e) 
{</code></div><div class="line number31 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java string">"Error while 
translating."</code><code class="java plain">;</code></div><div class="line 
number32 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number33 index10 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>In order to use the Microsoft Translation API, you need to sign up for a 
Microsoft account, get an API key, then pass the key to Tika before 
translating.</p><div id="highlighter_364857" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
microsoftTranslateToFrench(String text) {</code></div><div class="line number24 
index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">MicrosoftTranslator translator = </code><code class="java 
keyword">new</code> <code class="java 
plain">MicrosoftTranslator();</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java comments">// Change the id and secret! See <a 
href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.";>http://msdn.microso
 ft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line number26 
index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">translator.setId(</code><code class="java 
string">"dummy-id"</code><code class="java plain">);</code></div><div 
class="line number27 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">translator.setSecret(</code><code class="java 
string">"dummy-secret"</code><code class="java plain">);</code></div><div 
class="line number28 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number29 index6 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">translator.translate(text, </code><code class="java 
string">"fr"</code><code class="java plain">);</code></div><div class=
 "line number30 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code 
class="java keyword">catch</code> <code class="java plain">(Exception e) 
{</code></div><div class="line number31 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java string">"Error while 
translating."</code><code class="java plain">;</code></div><div class="line 
number32 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number33 index10 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Language_Identification">Language Identification</a></h3>
-<p>Tika provides support for identifying the language of text, through the <a 
href="./api/org/apache/tika/language/LanguageIdentifier.html">LanguageIdentifier</a>
 class.</p><div id="highlighter_464973" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
identifyLanguage(String text) {</code></div><div class="line number24 index1 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">LanguageIdentifier identifier = </code><code class="java 
keyword">new</code> <code class="java 
plain">LanguageIdentifier(text);</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">identifier.getLanguage();</code></div><div class="line number26 index3 
alt
 1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>Tika provides support for identifying the language of text, through the <a 
href="./api/org/apache/tika/language/LanguageIdentifier.html">LanguageIdentifier</a>
 class.</p><div id="highlighter_245393" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
identifyLanguage(String text) {</code></div><div class="line number24 index1 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">LanguageIdentifier identifier = </code><code class="java 
keyword">new</code> <code class="java 
plain">LanguageIdentifier(text);</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">identifier.getLanguage();</code></div><div class="line number26 index3 
alt
 1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h3><a name="Additional_Examples">Additional Examples</a></h3>
 <p>A number of other examples are also available, including all of the 
examples from the <a class="externalLink" 
href="http://manning.com/mattmann/";>Tika In Action book</a>. These can all be 
found in the <a class="externalLink" 
href="https://svn.apache.org/repos/asf/tika/trunk/tika-example";>Tika Example 
module</a> in SVN.</p></div></div>

Modified: tika/site/publish/1.10/formats.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/formats.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.10/formats.html (original)
+++ tika/site/publish/1.10/formats.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - Supported Document Formats</title>
+    <title>Apache Tika &#x2013; Supported Document Formats</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,7 +85,7 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>Supported Document Formats<a name="Supported_Document_Formats"></a></h2>
+<h2><a name="Supported_Document_Formats"></a>Supported Document Formats</h2>
 <p>This page lists all the document formats supported by the parsers in Apache 
Tika 1.10. Follow the links to the various parser class javadocs for more 
detailed information about each document format and how it is parsed by 
Tika.</p>
 <p><b>Please note</b> that Apache Tika is able to detect a much wider range of 
formats than those listed below, this page only documents those formats from 
which Tika is able to extract metadata and/or textual content.</p>
 <ul>
@@ -207,7 +207,7 @@
 <p>The <a 
href="./api/org/apache/tika/parser/jdbc/SQLite3Parser.html">SQLite3Parser</a> 
is able to extract content from SQLite3 files, in a tabular form. However, it 
requires that the <a href="#org.xerial_sqlite-jdbc_jar"></a> is manually added 
to the classpath first, as that binary jar isn't shipped as standard.</p>
 <p>The <a 
href="./api/org/apache/tika/parser/microsoft/JackcessParser.html">JackcessParser</a>
 is able to extract metadata and content in a tabular form, from Microsoft 
Access database files.</p></div></div>
 <div class="section">
-<h2>Full list of supported formats:<a 
name="Full_list_of_supported_formats:"></a></h2>
+<h2><a name="Full_list_of_supported_formats:"></a>Full list of supported 
formats:</h2>
 <ul>
 <li>org.apache.tika.parser.asm.<a 
href="./api/org/apache/tika/parser/asm/ClassParser">ClassParser</a>
 <ul>

Modified: tika/site/publish/1.10/gettingstarted.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/gettingstarted.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.10/gettingstarted.html (original)
+++ tika/site/publish/1.10/gettingstarted.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - Getting Started with Apache Tika</title>
+    <title>Apache Tika &#x2013; Getting Started with Apache Tika</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,10 +85,10 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>Getting Started with Apache Tika<a 
name="Getting_Started_with_Apache_Tika"></a></h2>
+<h2><a name="Getting_Started_with_Apache_Tika"></a>Getting Started with Apache 
Tika</h2>
 <p>This document describes how to build Apache Tika from sources and how to 
start using Tika in an application.</p></div>
 <div class="section">
-<h2>Getting and building the sources<a 
name="Getting_and_building_the_sources"></a></h2>
+<h2><a name="Getting_and_building_the_sources"></a>Getting and building the 
sources</h2>
 <p>To build Tika from sources you first need to either <a 
href="../download.html">download</a> a source release or <a 
href="../source-repository.html">checkout</a> the latest sources from version 
control.</p>
 <p>Once you have the sources, you can build them using the <a 
class="externalLink" href="http://maven.apache.org/";>Maven 2</a> build system. 
Executing the following command in the base directory will build the sources 
and install the resulting artifacts in your local Maven repository.</p>
 <div>
@@ -96,7 +96,7 @@
 <p>See the Maven documentation for more information about the available build 
options.</p>
 <p>Note that you need Java 7 or higher to build Tika.</p></div>
 <div class="section">
-<h2>Build artifacts<a name="Build_artifacts"></a></h2>
+<h2><a name="Build_artifacts"></a>Build artifacts</h2>
 <p>The Tika build consists of a number of components and produces the 
following main binaries:</p>
 <dl>
 <dt>tika-core/target/tika-core-*.jar</dt>
@@ -110,7 +110,7 @@
 <dt>tika-bundle/target/tika-bundle-*.jar</dt>
 <dd> Tika bundle. An OSGi bundle that combines tika-parsers with non-OSGified 
parser libraries to make them easy to deploy in an OSGi 
environment.</dd></dl></div>
 <div class="section">
-<h2>Using Tika as a Maven dependency<a 
name="Using_Tika_as_a_Maven_dependency"></a></h2>
+<h2><a name="Using_Tika_as_a_Maven_dependency"></a>Using Tika as a Maven 
dependency</h2>
 <p>The core library, tika-core, contains the key interfaces and classes of 
Tika and can be used by itself if you don't need the full set of parsers from 
the tika-parsers component. The tika-core dependency looks like this:</p>
 <div>
 <pre>  &lt;dependency&gt;
@@ -129,7 +129,7 @@
 <div>
 <pre>$ mvn dependency:tree | grep :compile</pre></div></div>
 <div class="section">
-<h2>Using Tika in an Ant project<a 
name="Using_Tika_in_an_Ant_project"></a></h2>
+<h2><a name="Using_Tika_in_an_Ant_project"></a>Using Tika in an Ant 
project</h2>
 <p>Unless you use a dependency manager tool like <a class="externalLink" 
href="http://ant.apache.org/ivy/";>Apache Ivy</a>, the easiest way to use Tika 
is to include either the tika-core or the tika-app jar in your classpath, 
depending on whether you want just the core functionality or also all the 
parser implementations.</p>
 <div>
 <pre>&lt;classpath&gt;
@@ -142,7 +142,7 @@
 
 &lt;/classpath&gt;</pre></div></div>
 <div class="section">
-<h2>Using Tika as a command line utility<a 
name="Using_Tika_as_a_command_line_utility"></a></h2>
+<h2><a name="Using_Tika_as_a_command_line_utility"></a>Using Tika as a command 
line utility</h2>
 <p>The Tika application jar (tika-app-*.jar) can be used as a command line 
utility for extracting text content and metadata from all sorts of files. This 
runnable jar contains all the dependencies it needs, so you don't need to worry 
about classpath settings to run it.</p>
 <p>The usage instructions are shown below.</p>
 <div>
@@ -218,7 +218,7 @@ curl http://.../document.doc \
   | java -jar tika-app.jar --text \
   | grep -q keyword</pre></div></div>
 <div class="section">
-<h2>Wrappers<a name="Wrappers"></a></h2>
+<h2><a name="Wrappers"></a>Wrappers</h2>
 <p>Several wrappers are available to use Tika in another programming language, 
such as <a class="externalLink" 
href="https://github.com/aviks/Taro.jl";>Julia</a> or <a class="externalLink" 
href="https://github.com/chrismattmann/tika-python";>Python</a>.</p></div>
       </div>
       <div id="sidebar">

Modified: tika/site/publish/1.10/index.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/index.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.10/index.html (original)
+++ tika/site/publish/1.10/index.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - Apache Tika 1.10</title>
+    <title>Apache Tika &#x2013; Apache Tika 1.10</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,7 +85,7 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>Apache Tika 1.10<a name="Apache_Tika_1.10"></a></h2>
+<h2><a name="Apache_Tika_1.10"></a>Apache Tika 1.10</h2>
 <p>The most notable changes in Tika 1.10 over the previous release are:</p>
 <ul>
 <li>Tika Config XML can now be used to create composite detectors, and exclude 
detectors that DefaultDetector would otherwise have used. This brings support 
in-line with Parsers. (<a class="externalLink" 
href="http://issues.apache.org/jira/browse/TIKA-1702";>TIKA-1702</a>).</li>

Modified: tika/site/publish/1.10/parser.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/parser.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.10/parser.html (original)
+++ tika/site/publish/1.10/parser.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - The Parser interface</title>
+    <title>Apache Tika &#x2013; The Parser interface</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,7 +85,7 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>The Parser interface<a name="The_Parser_interface"></a></h2>
+<h2><a name="The_Parser_interface"></a>The Parser interface</h2>
 <p>The <a 
href="./api/org/apache/tika/parser/Parser.html">org.apache.tika.parser.Parser</a>
 interface is the key concept of Apache Tika. It hides the complexity of 
different file formats and parsing libraries while providing a simple and 
powerful mechanism for client applications to extract structured text content 
and metadata from all sorts of documents. All this is achieved with a single 
method:</p>
 <div>
 <pre>void parse(
@@ -105,7 +105,7 @@
 <dd>While the default settings and behaviour of Tika parsers should work well 
for most use cases, there are still situations where more fine-grained control 
over the parsing process is desirable. It should be easy to inject such 
context-specific information to the parsing process without breaking the layers 
of abstraction.</dd></dl>
 <p>These criteria are reflected in the arguments of the <tt>parse</tt> 
method.</p>
 <div class="section">
-<h3>Document input stream<a name="Document_input_stream"></a></h3>
+<h3><a name="Document_input_stream"></a>Document input stream</h3>
 <p>The first argument is an <a class="externalLink" 
href="http://docs.oracle.com/javase/6/docs/api/java/io/InputStream.html";>InputStream</a>
 for reading the document to be parsed.</p>
 <p>If this document stream can not be read, then parsing stops and the thrown 
<a class="externalLink" 
href="http://docs.oracle.com/javase/6/docs/api/java/io/IOException.html";>IOException</a>
 is passed up to the client application. If the stream can be read but not 
parsed (for example if the document is corrupted), then the parser throws a <a 
href="./api/org/apache/tika/exception/TikaException.html">TikaException</a>.</p>
 <p>The parser implementation will consume this stream but <i>will not close 
it</i>. Closing the stream is the responsibility of the client application that 
opened it in the first place. The recommended pattern for using streams with 
the <tt>parse</tt> method is:</p>
@@ -118,7 +118,7 @@ try {
 }</pre></div>
 <p>Some document formats like the OLE2 Compound Document Format used by 
Microsoft Office are best parsed as random access files. In such cases the 
content of the input stream is automatically spooled to a temporary file that 
gets removed once parsed. A future version of Tika may make it possible to 
avoid this extra file if the input document is already a file in the local file 
system. See <a class="externalLink" 
href="https://issues.apache.org/jira/browse/TIKA-153";>TIKA-153</a> for the 
status of this feature request.</p></div>
 <div class="section">
-<h3>XHTML SAX events<a name="XHTML_SAX_events"></a></h3>
+<h3><a name="XHTML_SAX_events"></a>XHTML SAX events</h3>
 <p>The parsed content of the document stream is returned to the client 
application as a sequence of XHTML SAX events. XHTML is used to express 
structured content of the document and SAX events enable streamed processing. 
Note that the XHTML format is used here only to convey structural information, 
not to render the documents for browsing!</p>
 <p>The XHTML SAX events produced by the parser implementation are sent to a <a 
class="externalLink" 
href="http://docs.oracle.com/javase/6/docs/api/org/xml/sax/ContentHandler.html";>ContentHandler</a>
 instance given to the <tt>parse</tt> method. If this the content handler fails 
to process an event, then parsing stops and the thrown <a class="externalLink" 
href="http://docs.oracle.com/javase/6/docs/api/org/xml/sax/SAXException.html";>SAXException</a>
 is passed up to the client application.</p>
 <p>The overall structure of the generated event stream is (with indenting 
added for clarity):</p>
@@ -147,7 +147,7 @@ try {
     reader.close();       // the document stream is closed automatically
 }</pre></div></div>
 <div class="section">
-<h3>Document metadata<a name="Document_metadata"></a></h3>
+<h3><a name="Document_metadata"></a>Document metadata</h3>
 <p>The third argument to the <tt>parse</tt> method is used to pass document 
metadata both in and out of the parser. Document metadata is expressed as an <a 
href="./api/org/apache/tika/metadata/Metadata.html">Metadata</a> object.</p>
 <p>The following are some of the more interesting metadata properties:</p>
 <dl>
@@ -167,10 +167,10 @@ try {
 <p>The parser implementation sets this property if the document format 
contains an explicit author field.</p></dd></dl>
 <p>Note that metadata handling is still being discussed by the Tika 
development team, and it is likely that there will be some (backwards 
incompatible) changes in metadata handling before Tika 1.0.</p></div>
 <div class="section">
-<h3>Parse context<a name="Parse_context"></a></h3>
+<h3><a name="Parse_context"></a>Parse context</h3>
 <p>The final argument to the <tt>parse</tt> method is used to inject 
context-specific information to the parsing process. This is useful for example 
when dealing with locale-specific date and number formats in Microsoft Excel 
spreadsheets. Another important use of the parse context is passing in the 
delegate parser instance to be used by two-phase parsers like the <a 
href="./api/org/apache/parser/pkg/PackageParser.html">PackageParser</a> 
subclasses. Some parser classes allow customization of the parsing process 
through strategy objects in the parse context.</p></div>
 <div class="section">
-<h3>Parser implementations<a name="Parser_implementations"></a></h3>
+<h3><a name="Parser_implementations"></a>Parser implementations</h3>
 <p>Apache Tika comes with a number of parser classes for parsing <a 
href="./formats.html">various document formats</a>. You can also extend Tika 
with your own parsers, and of course any contributions to Tika are warmly 
welcome.</p>
 <p>The goal of Tika is to reuse existing parser libraries like <a 
class="externalLink" href="http://pdfbox.apache.org/";>PDFBox</a> or <a 
class="externalLink" href="http://poi.apache.org/";>Apache POI</a> as much as 
possible, and so most of the parser classes in Tika are adapters to such 
external libraries.</p>
 <p>Tika also contains some general purpose parser implementations that are not 
targeted at any specific document formats. The most notable of these is the <a 
href="./api/org/apache/tika/parser/AutoDetectParser.html">AutoDetectParser</a> 
class that encapsulates all Tika functionality into a single parser that can 
handle any types of documents. This parser will automatically determine the 
type of the incoming document based on various heuristics and will then parse 
the document accordingly.</p></div>

Modified: tika/site/publish/1.10/parser_guide.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/parser_guide.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.10/parser_guide.html (original)
+++ tika/site/publish/1.10/parser_guide.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - Get Tika parsing up and running in 5 minutes</title>
+    <title>Apache Tika &#x2013; Get Tika parsing up and running in 5 
minutes</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,7 +85,7 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>Get Tika parsing up and running in 5 minutes<a 
name="Get_Tika_parsing_up_and_running_in_5_minutes"></a></h2>
+<h2><a name="Get_Tika_parsing_up_and_running_in_5_minutes"></a>Get Tika 
parsing up and running in 5 minutes</h2>
 <p>This page is a quick start guide showing how to add a new parser to Apache 
Tika. Following the simple steps listed below your new parser can be running in 
only 5 minutes.</p>
 <ul>
 <li><a href="#Get_Tika_parsing_up_and_running_in_5_minutes">Get Tika parsing 
up and running in 5 minutes</a>

Modified: tika/site/publish/1.11/configuring.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.11/configuring.html?rev=1710509&r1=1710508&r2=1710509&view=diff
==============================================================================
--- tika/site/publish/1.11/configuring.html (original)
+++ tika/site/publish/1.11/configuring.html Mon Oct 26 04:40:33 2015
@@ -29,7 +29,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml";>
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <title>Apache Tika - Configuring Tika</title>
+    <title>Apache Tika &#x2013; Configuring Tika</title>
     <style type="text/css" media="all">
       @import url("../css/site.css");
     </style>
@@ -85,7 +85,7 @@
       </div>
       <div id="content">
         <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section">
-<h2>Configuring Tika<a name="Configuring_Tika"></a></h2>
+<h2><a name="Configuring_Tika"></a>Configuring Tika</h2>
 <p>Out of the box, Apache Tika will attempt to start with all available 
Detectors and Parsers, running with sensible defaults. For most users, this 
default configuration will work well.</p>
 <p>This page gives you information on how to configure the various components 
of Apache Tika, such as Parsers and Detectors, if you need fine-grained control 
over ordering, exclusions and the like.</p>
 <ul>
@@ -167,10 +167,10 @@
 <p>Tika has a number of service provider types such as parsers, detectors, and 
translators. The <a 
href="./api/org/apache/tika/config/ServiceLoader.html">org.apache.tika.config.ServiceLoader</a>
 class provides a registry of each type of provider. This allows Tika to create 
implementations such as <a 
href="./api/org/apache/tika/parser/DefaultParser.html">org.apache.tika.parser.DefaultParser</a>,
 <a 
href="./api/org/apache/tika/language/translate/DefaultTranslator.html">org.apache.tika.language.translate.DefaultTranslator</a>,
 and <a 
href="./api/org/apache/tika/detect/DefaultDetector.html">org.apache.tika.detect.DefaultDetector</a>
 that can match the appropriate provider to an incoming piece of content.</p>
 <p>The ServiceLoader's registry can be populated either statically or 
dynamically.</p>
 <div class="section">
-<h4>Static<a name="Static"></a></h4>
+<h4><a name="Static"></a>Static</h4>
 <p>Static loading is the default which requires no configuration. This 
configuration options is used in Tika deployments where the Tika JAR files 
reside together in the same classloader hierarchy. The services provides are 
loaded from provider configuration files located within the tika-parsers JAR 
file at META-INF/services.</p></div>
 <div class="section">
-<h4>Dynamic<a name="Dynamic"></a></h4>
+<h4><a name="Dynamic"></a>Dynamic</h4>
 <p>Dynamic loading may be required if the tika service providers will reside 
in different classloaders such as in OSGi. To allow a provider created in 
tika-config.xml to utilize dynamically loaded services you need to configure 
the ServiceLoader to be dynamic with the following configuration:</p>
 <div>
 <pre>&lt;properties&gt;
@@ -178,7 +178,7 @@
   ....
 &lt;/properties&gt;</pre></div></div>
 <div class="section">
-<h4>Load Error Handling<a name="Load_Error_Handling"></a></h4>
+<h4><a name="Load_Error_Handling"></a>Load Error Handling</h4>
 <p>The ServiceLoader can contains a handler to deal with errors that occur 
during provider initialization. For example if a class fails to initialize 
LoadErrorHandler deals with the exception that is thrown. This handler can be 
configured to:</p>
 <ul>
 <li><tt> IGNORE </tt> - (Default) Do nothing when providers fail to 
initialize.</li>

svn commit: r1710509 [3/9] - in /tika/site: ./ publish/ publish/0.10/ publish/0.5/ publish/0.6/ publish/0.7/ publish/0.8/ publish/0.9/ publish/1.0/ publish/1.1/ publish/1.10/ publish/1.11/ publish/1.2/ publish/1.3/ publish/1.4/ publish/1.5/ publish/1.6...

Reply via email to