Author: tpalsulich
Date: Mon Apr 13 16:57:57 2015
New Revision: 1673240

URL: http://svn.apache.org/r1673240
Log:
TIKA-1593. Fix doco in Parser Quick Start Guide.

Added:
    tika/site/src/site/apt/contribute.apt.vm
Removed:
    tika/site/src/site/apt/contribute.apt
Modified:
    tika/site/publish/1.7/examples.html
    tika/site/publish/contribute.html
    tika/site/publish/plugin-management.html

Modified: tika/site/publish/1.7/examples.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.7/examples.html?rev=1673240&r1=1673239&r2=1673240&view=diff
==============================================================================
--- tika/site/publish/1.7/examples.html (original)
+++ tika/site/publish/1.7/examples.html Mon Apr 13 16:57:57 2015
@@ -115,41 +115,41 @@
 <p>The <a href="./apidocs/org/apache/tika/Tika.html">Tika facade</a>, provides 
a number of very quick and easy ways to have your content parsed by Tika, and 
return the resulting plain text</p><style type="text/css">
    @import url('attached-includes/css/shCoreDefault.css');
 </style>
-<div id="highlighter_896487" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number37 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">String parseToStringExample() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number38 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ParsingExample.</code><code class="java keyword">class</code><code 
class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">);</code></div><div 
class="line number39 index2 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Tika tika = 
</code><code class="java keyword">new</code> <code class="java 
plain">Tika();</code></div><
 div class="line number40 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number41 index4 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">tika.parseToString(stream);</code></div><div class="line number42 index5 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number43 index6 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number44 index7 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number45 index8 alt2"><code 
class="java plain">}</code></div></
 div></td></tr></tbody></table></div></div>
+<div id="highlighter_311815" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number37 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">String parseToStringExample() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number38 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ParsingExample.</code><code class="java keyword">class</code><code 
class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">);</code></div><div 
class="line number39 index2 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Tika tika = 
</code><code class="java keyword">new</code> <code class="java 
plain">Tika();</code></div><
 div class="line number40 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number41 index4 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">tika.parseToString(stream);</code></div><div class="line number42 index5 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number43 index6 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number44 index7 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number45 index8 alt2"><code 
class="java plain">}</code></div></
 div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Parsing_using_the_Auto-Detect_Parser">Parsing using the 
Auto-Detect Parser</a></h4>
-<p>For more control, you can call the <a 
href="./apidocs/org/apache/tika/parser/Parser.html">Tika Parsers</a> directly. 
Most likely, you'll want to start out using the <a 
href="./apidocs/org/apache/tika/parser/AutoDetectParser.html">Auto-Detect 
Parser</a>, which automatically figures out what kind of content you have, then 
calls the appropriate parser for you.</p><div id="highlighter_14996" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number66 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseExample() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number67 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">InputStream stream = ParsingExample.</code><code class="java 
keyword">class</code><co
 de class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">);</code></div><div 
class="line number68 index2 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number69 index3 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number70 index4 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number71 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try
 </code> <code class="java plain">{</code></div><div class="line number72 
index6 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number73 index7 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number74 index8 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number75 index9 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number76 
index10 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="li
 ne number77 index11 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>For more control, you can call the <a 
href="./apidocs/org/apache/tika/parser/Parser.html">Tika Parsers</a> directly. 
Most likely, you'll want to start out using the <a 
href="./apidocs/org/apache/tika/parser/AutoDetectParser.html">Auto-Detect 
Parser</a>, which automatically figures out what kind of content you have, then 
calls the appropriate parser for you.</p><div id="highlighter_795099" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number66 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseExample() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number67 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">InputStream stream = ParsingExample.</code><code class="java 
keyword">class</code><c
 ode class="java plain">.getResourceAsStream(</code><code class="java 
string">"test.doc"</code><code class="java plain">);</code></div><div 
class="line number68 index2 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number69 index3 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number70 index4 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number71 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">tr
 y</code> <code class="java plain">{</code></div><div class="line number72 
index6 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number73 index7 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number74 index8 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number75 index9 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number76 
index10 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="l
 ine number77 index11 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Picking_different_output_formats">Picking different output 
formats</a></h3>
 <p>With Tika, you can get the textual content of your files returned in a 
number of different formats. These can be plain text, html, xhtml, xhtml of one 
part of the file etc. This is controlled based on the <a class="externalLink" 
href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html";>ContentHandler</a>
 you supply to the Parser.</p>
 <div class="section">
 <h4><a name="Parsing_to_Plain_Text">Parsing to Plain Text</a></h4>
-<p>By using the <a 
href="./apidocs/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>,
 you can request that Tika return only the content of the document's body as a 
plain-text string.</p><div id="highlighter_545111" class="syntaxhighlighter 
nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number46 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseToPlainText() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number47 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number48 index2 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><
 div class="line number49 index3 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test.doc"</code><code class="java 
plain">);</code></div><div class="line number50 index4 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number51 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number52 index6 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">{</code><
 /div><div class="line number53 index7 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number54 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number55 index9 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number56 index10 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number57 
index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number58 index12 alt1"><code 
class="ja
 va plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>By using the <a 
href="./apidocs/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>,
 you can request that Tika return only the content of the document's body as a 
plain-text string.</p><div id="highlighter_175358" class="syntaxhighlighter 
nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number46 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseToPlainText() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number47 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">BodyContentHandler handler = </code><code class="java 
keyword">new</code> <code class="java 
plain">BodyContentHandler();</code></div><div class="line number48 index2 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><
 div class="line number49 index3 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test.doc"</code><code class="java 
plain">);</code></div><div class="line number50 index4 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number51 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number52 index6 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">{</code><
 /div><div class="line number53 index7 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number54 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number55 index9 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number56 index10 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number57 
index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number58 index12 alt1"><code 
class="ja
 va plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Parsing_to_XHTML">Parsing to XHTML</a></h4>
-<p>By using the <a 
href="./apidocs/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>,
 you can get the XHTML content of the whole document as a string.</p><div 
id="highlighter_832055" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number63 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">String parseToHTML() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number64 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler();</code></div><div class="line number65 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div 
class="line number66 index3 alt1">
 <code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">InputStream stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test.doc"</code><code class="java 
plain">);</code></div><div class="line number67 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number68 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number69 index6 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">{</code></div><div class="line 
number70 index7 
 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number71 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number72 index9 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number73 index10 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number74 
index11 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number75 index12 alt2"><code 
class="java plain">}</code></div></div></td></t
 r></tbody></table></div>
-<p>If you just want the body of the xhtml document, without the header, you 
can chain together a <a 
href="./apidocs/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>
 and a <a 
href="./apidocs/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>
 as shown:</p><div id="highlighter_583333" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number81 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
parseBodyToHTML() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number82 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">BodyContentHandler(</code></div><div class="line number83 in
 dex2 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler());</code></div><div class="line number84 index3 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div 
class="line number85 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test.doc"</code><code class="java 
plain">);</code></div><div class="line number86 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number87 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&n
 bsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code 
class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number88 index7 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">{</code></div><div class="line 
number89 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number90 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number91 index10 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number92 index11 alt1">
 <code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number93 
index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number94 index13 alt1"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>By using the <a 
href="./apidocs/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>,
 you can get the XHTML content of the whole document as a string.</p><div 
id="highlighter_961517" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number63 index0 alt2"><code class="java 
keyword">public</code> <code class="java plain">String parseToHTML() 
</code><code class="java keyword">throws</code> <code class="java 
plain">IOException, SAXException, TikaException {</code></div><div class="line 
number64 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler();</code></div><div class="line number65 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div 
class="line number66 index3 alt1">
 <code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">InputStream stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test.doc"</code><code class="java 
plain">);</code></div><div class="line number67 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number68 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number69 index6 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">{</code></div><div class="line 
number70 index7 
 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number71 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number72 index9 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number73 index10 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number74 
index11 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number75 index12 alt2"><code 
class="java plain">}</code></div></div></td></t
 r></tbody></table></div>
+<p>If you just want the body of the xhtml document, without the header, you 
can chain together a <a 
href="./apidocs/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>
 and a <a 
href="./apidocs/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>
 as shown:</p><div id="highlighter_431023" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number81 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
parseBodyToHTML() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number82 index1 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">BodyContentHandler(</code></div><div class="line number83 in
 dex2 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java keyword">new</code> <code class="java 
plain">ToXMLContentHandler());</code></div><div class="line number84 index3 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div 
class="line number85 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test.doc"</code><code class="java 
plain">);</code></div><div class="line number86 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number87 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&n
 bsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code 
class="java keyword">new</code> <code class="java 
plain">Metadata();</code></div><div class="line number88 index7 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">try</code> <code class="java plain">{</code></div><div class="line 
number89 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number90 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number91 index10 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number92 index11 alt1">
 <code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number93 
index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number94 index13 alt1"><code 
class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Fetching_just_certain_bits_of_the_XHTML">Fetching just certain 
bits of the XHTML</a></h4>
-<p>It possible to execute XPath queries on the parse results, to fetch only 
certain bits of the XHTML. </p><div id="highlighter_233027" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number100 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseOnePartToHTML() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number101 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
comments">// Only get things under html -> body -> div 
(class=header)</code></div><div class="line number102 index2 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">XPathParser xhtmlParser = </code><code class="java keyword">new</code> 
<code class="java plain">XPathParser(</code><code class="java strin
 g">"xhtml"</code><code class="java plain">, 
XHTMLContentHandler.XHTML);</code></div><div class="line number103 index3 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Matcher divContentMatcher = 
xhtmlParser.parse(</code></div><div class="line number104 index4 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java 
string">"/xhtml:html/xhtml:body/xhtml:div/descendant::node()"</code><code 
class="java plain">);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
</code></div><div class="line number105 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">MatchingContentHandler(</code></div><div class="line number106 index6 
alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java ke
 yword">new</code> <code class="java plain">ToXMLContentHandler(), 
divContentMatcher);</code></div><div class="line number107 index7 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div class="line 
number108 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test2.doc"</code><code class="java 
plain">);</code></div><div class="line number109 index9 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number110 index10 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <cod
 e class="java plain">Metadata();</code></div><div class="line number111 
index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">try</code> <code class="java plain">{</code></div><div 
class="line number112 index12 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number113 index13 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number114 index14 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number115 index15 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plai
 n">stream.close();</code></div><div class="line number116 index16 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number117 index17 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>It possible to execute XPath queries on the parse results, to fetch only 
certain bits of the XHTML. </p><div id="highlighter_692563" 
class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" 
cellspacing="0"><tbody><tr><td class="code"><div class="container"><div 
class="line number100 index0 alt1"><code class="java keyword">public</code> 
<code class="java plain">String parseOnePartToHTML() </code><code class="java 
keyword">throws</code> <code class="java plain">IOException, SAXException, 
TikaException {</code></div><div class="line number101 index1 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
comments">// Only get things under html -> body -> div 
(class=header)</code></div><div class="line number102 index2 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">XPathParser xhtmlParser = </code><code class="java keyword">new</code> 
<code class="java plain">XPathParser(</code><code class="java strin
 g">"xhtml"</code><code class="java plain">, 
XHTMLContentHandler.XHTML);</code></div><div class="line number103 index3 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Matcher divContentMatcher = 
xhtmlParser.parse(</code></div><div class="line number104 index4 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java 
string">"/xhtml:html/xhtml:body/xhtml:div/descendant::node()"</code><code 
class="java plain">);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
</code></div><div class="line number105 index5 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler 
handler = </code><code class="java keyword">new</code> <code class="java 
plain">MatchingContentHandler(</code></div><div class="line number106 index6 
alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java ke
 yword">new</code> <code class="java plain">ToXMLContentHandler(), 
divContentMatcher);</code></div><div class="line number107 index7 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div class="line 
number108 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test2.doc"</code><code class="java 
plain">);</code></div><div class="line number109 index9 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">AutoDetectParser parser = </code><code class="java keyword">new</code> 
<code class="java plain">AutoDetectParser();</code></div><div class="line 
number110 index10 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata 
metadata = </code><code class="java keyword">new</code> <cod
 e class="java plain">Metadata();</code></div><div class="line number111 
index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">try</code> <code class="java plain">{</code></div><div 
class="line number112 index12 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number113 index13 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">handler.toString();</code></div><div class="line number114 index14 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">} </code><code class="java keyword">finally</code> <code 
class="java plain">{</code></div><div class="line number115 index15 alt2"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plai
 n">stream.close();</code></div><div class="line number116 index16 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number117 index17 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Custom_Content_Handlers">Custom Content Handlers</a></h3>
 <p>The textual output of parsing a file with Tika is returned via the SAX <a 
class="externalLink" 
href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html";>ContentHandler</a>
 you pass to the parse method. It is possible to customise your parsing by 
supplying your own ContentHandler which does special things.</p>
 <div class="section">
 <h4><a name="Extract_Phone_Numbers_from_Content_into_the_Metadata">Extract 
Phone Numbers from Content into the Metadata</a></h4>
-<p>By using the <a 
href="./apidocs/org/apache/tika/sax/PhoneExtractingContentHandler.html">PhoneExtractingContentHandler</a>,
 you can have any phone numbers found in the textual content of the document 
extracted and placed into the Metadata object for you.</p><div 
id="highlighter_721339" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number69 index0 alt2"><code class="java 
keyword">public</code> <code class="java keyword">static</code> <code 
class="java keyword">void</code> <code class="java plain">process(File file) 
</code><code class="java keyword">throws</code> <code class="java 
plain">Exception {</code></div><div class="line number70 index1 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">Parser parser = </code><code class="java keyword">new</code> <code 
class="java plain">AutoDetectParser();</code></div><div class="line number71 
 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number72 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// The 
PhoneExtractingContentHandler will examine any characters for phone numbers 
before passing them</code></div><div class="line number73 index4 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
comments">// to the underlying Handler.</code></div><div class="line number74 
index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">PhoneExtractingContentHandler handler = </code><code 
class="java keyword">new</code> <code class="java 
plain">PhoneExtractingContentHandler(</code><code class="java 
keyword">new</code> <code class="java plain">BodyContentHandler(), 
metadata);</code></div><div cl
 ass="line number75 index6 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = </code><code class="java keyword">new</code> <code class="java 
plain">FileInputStream(file);</code></div><div class="line number76 index7 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">try</code> <code class="java plain">{</code></div><div 
class="line number77 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata, </code><code 
class="java keyword">new</code> <code class="java 
plain">ParseContext());</code></div><div class="line number78 index9 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number79 index10 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">finally</code> <code class="java plain">{
 </code></div><div class="line number80 index11 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number81 
index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number82 index13 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">String[] numbers = metadata.getValues(</code><code class="java 
string">"phonenumbers"</code><code class="java plain">);</code></div><div 
class="line number83 index14 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">for</code> 
<code class="java plain">(String number : numbers) {</code></div><div 
class="line number84 index15 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">phoneNumbers.add(number);</code></div><div class="line 
number85 index16 al
 t2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number86 index17 alt1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>By using the <a 
href="./apidocs/org/apache/tika/sax/PhoneExtractingContentHandler.html">PhoneExtractingContentHandler</a>,
 you can have any phone numbers found in the textual content of the document 
extracted and placed into the Metadata object for you.</p><div 
id="highlighter_737250" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number69 index0 alt2"><code class="java 
keyword">public</code> <code class="java keyword">static</code> <code 
class="java keyword">void</code> <code class="java plain">process(File file) 
</code><code class="java keyword">throws</code> <code class="java 
plain">Exception {</code></div><div class="line number70 index1 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">Parser parser = </code><code class="java keyword">new</code> <code 
class="java plain">AutoDetectParser();</code></div><div class="line number71 
 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number72 index3 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// The 
PhoneExtractingContentHandler will examine any characters for phone numbers 
before passing them</code></div><div class="line number73 index4 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
comments">// to the underlying Handler.</code></div><div class="line number74 
index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">PhoneExtractingContentHandler handler = </code><code 
class="java keyword">new</code> <code class="java 
plain">PhoneExtractingContentHandler(</code><code class="java 
keyword">new</code> <code class="java plain">BodyContentHandler(), 
metadata);</code></div><div cl
 ass="line number75 index6 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = </code><code class="java keyword">new</code> <code class="java 
plain">FileInputStream(file);</code></div><div class="line number76 index7 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">try</code> <code class="java plain">{</code></div><div 
class="line number77 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata, </code><code 
class="java keyword">new</code> <code class="java 
plain">ParseContext());</code></div><div class="line number78 index9 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number79 index10 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
keyword">finally</code> <code class="java plain">{
 </code></div><div class="line number80 index11 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number81 
index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number82 index13 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">String[] numbers = metadata.getValues(</code><code class="java 
string">"phonenumbers"</code><code class="java plain">);</code></div><div 
class="line number83 index14 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">for</code> 
<code class="java plain">(String number : numbers) {</code></div><div 
class="line number84 index15 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">phoneNumbers.add(number);</code></div><div class="line 
number85 index16 al
 t2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number86 index17 alt1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Streaming_the_plain_text_in_chunks">Streaming the plain text in 
chunks</a></h4>
-<p>Sometimes, you want to chunk the resulting text up, perhaps to output as 
you go minimising memory use, perhaps to output to HDFS files, or any other 
reason! With a small custom content handler, you can do that.</p><div 
id="highlighter_197992" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number124 index0 alt1"><code class="java 
keyword">public</code> <code class="java plain">List&lt;String> 
parseToPlainTextChunks() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number125 index1 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">final</code> 
<code class="java plain">List&lt;String> chunks = </code><code class="java 
keyword">new</code> <code class="java 
plain">ArrayList&lt;String>();</code></div><div class="line number126 index2 
alt1"><
 code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">chunks.add(</code><code class="java string">""</code><code class="java 
plain">);</code></div><div class="line number127 index3 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">ContentHandlerDecorator handler = </code><code class="java 
keyword">new</code> <code class="java plain">ContentHandlerDecorator() 
{</code></div><div class="line number128 index4 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java color1">@Override</code></div><div class="line number129 index5 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">public</code> <code class="java keyword">void</code> <code 
class="java plain">characters(</code><code class="java 
keyword">char</code><code class="java plain">[] ch, </code><code class="java 
keyword">int</code> <code class="java plain">star
 t, </code><code class="java keyword">int</code> <code class="java 
plain">length) {</code></div><div class="line number130 index6 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String lastChunk = chunks.get(chunks.size()-</code><code 
class="java value">1</code><code class="java plain">);</code></div><div 
class="line number131 index7 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String thisStr = </code><code class="java 
keyword">new</code> <code class="java plain">String(ch, start, 
length);</code></div><div class="line number132 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div
 class="line number133 index9 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 clas
 s="java keyword">if</code> <code class="java plain">(lastChunk.length()+length 
> MAXIMUM_TEXT_CHUNK_SIZE) {</code></div><div class="line number134 index10 
alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.add(thisStr);</code></div><div class="line number135 
index11 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">} </code><code class="java keyword">else</code> <code 
class="java plain">{</code></div><div class="line number136 index12 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.set(chunks.size()-</code><code class="java 
value">1</code><code class="java plain">, lastChunk+thisStr);</code></div><div 
class="line number137 index13 alt2"><code class="java spaces">&nbsp
 
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number138 index14 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number139 index15 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">};</code></div><div class="line number140 index16 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div class="line 
number141 index17 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test2.doc"</code><code class="java 
plain">);</code></div><div class="line number142 index18 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Aut
 oDetectParser parser = </code><code class="java keyword">new</code> <code 
class="java plain">AutoDetectParser();</code></div><div class="line number143 
index19 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number144 index20 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number145 index21 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number146 index22 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">chunks;</code></div><div class="line number147 index23 alt2"><code class
 ="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} 
</code><code class="java keyword">finally</code> <code class="java 
plain">{</code></div><div class="line number148 index24 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number149 
index25 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number150 index26 alt1"><code 
class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>Sometimes, you want to chunk the resulting text up, perhaps to output as 
you go minimising memory use, perhaps to output to HDFS files, or any other 
reason! With a small custom content handler, you can do that.</p><div 
id="highlighter_571589" class="syntaxhighlighter nogutter  java"><table 
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div 
class="container"><div class="line number124 index0 alt1"><code class="java 
keyword">public</code> <code class="java plain">List&lt;String> 
parseToPlainTextChunks() </code><code class="java keyword">throws</code> <code 
class="java plain">IOException, SAXException, TikaException {</code></div><div 
class="line number125 index1 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">final</code> 
<code class="java plain">List&lt;String> chunks = </code><code class="java 
keyword">new</code> <code class="java 
plain">ArrayList&lt;String>();</code></div><div class="line number126 index2 
alt1"><
 code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">chunks.add(</code><code class="java string">""</code><code class="java 
plain">);</code></div><div class="line number127 index3 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">ContentHandlerDecorator handler = </code><code class="java 
keyword">new</code> <code class="java plain">ContentHandlerDecorator() 
{</code></div><div class="line number128 index4 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java color1">@Override</code></div><div class="line number129 index5 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">public</code> <code class="java keyword">void</code> <code 
class="java plain">characters(</code><code class="java 
keyword">char</code><code class="java plain">[] ch, </code><code class="java 
keyword">int</code> <code class="java plain">star
 t, </code><code class="java keyword">int</code> <code class="java 
plain">length) {</code></div><div class="line number130 index6 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String lastChunk = chunks.get(chunks.size()-</code><code 
class="java value">1</code><code class="java plain">);</code></div><div 
class="line number131 index7 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">String thisStr = </code><code class="java 
keyword">new</code> <code class="java plain">String(ch, start, 
length);</code></div><div class="line number132 index8 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div
 class="line number133 index9 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 clas
 s="java keyword">if</code> <code class="java plain">(lastChunk.length()+length 
> MAXIMUM_TEXT_CHUNK_SIZE) {</code></div><div class="line number134 index10 
alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.add(thisStr);</code></div><div class="line number135 
index11 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">} </code><code class="java keyword">else</code> <code 
class="java plain">{</code></div><div class="line number136 index12 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code
 class="java plain">chunks.set(chunks.size()-</code><code class="java 
value">1</code><code class="java plain">, lastChunk+thisStr);</code></div><div 
class="line number137 index13 alt2"><code class="java spaces">&nbsp
 
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number138 index14 alt1"><code 
class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number139 index15 alt2"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">};</code></div><div class="line number140 index16 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code>&nbsp;</div><div class="line 
number141 index17 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">InputStream 
stream = ContentHandlerExample.</code><code class="java 
keyword">class</code><code class="java plain">.getResourceAsStream(</code><code 
class="java string">"test2.doc"</code><code class="java 
plain">);</code></div><div class="line number142 index18 alt1"><code 
class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Aut
 oDetectParser parser = </code><code class="java keyword">new</code> <code 
class="java plain">AutoDetectParser();</code></div><div class="line number143 
index19 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">Metadata metadata = </code><code class="java 
keyword">new</code> <code class="java plain">Metadata();</code></div><div 
class="line number144 index20 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number145 index21 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">parser.parse(stream, handler, metadata);</code></div><div 
class="line number146 index22 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">chunks;</code></div><div class="line number147 index23 alt2"><code class
 ="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} 
</code><code class="java keyword">finally</code> <code class="java 
plain">{</code></div><div class="line number148 index24 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">stream.close();</code></div><div class="line number149 
index25 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">}</code></div><div class="line number150 index26 alt1"><code 
class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Translation">Translation</a></h3>
 <p>Tika provides a pluggable Translation system, which allow you to send the 
results of parsing off to an external system or program to have the text 
translated into another language.</p>
 <div class="section">
 <h4><a name="Translation_using_the_Microsoft_Translation_API">Translation 
using the Microsoft Translation API</a></h4>
-<p>In order to use the Microsoft Translation API, you need to sign up for a 
Microsoft account, get an API key, then pass the key to Tika before 
translating.</p><div id="highlighter_560986" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
microsoftTranslateToFrench(String text) {</code></div><div class="line number24 
index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">MicrosoftTranslator translator = </code><code class="java 
keyword">new</code> <code class="java 
plain">MicrosoftTranslator();</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java comments">// Change the id and secret! See <a 
href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.";>http://msdn.microso
 ft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line number26 
index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">translator.setId(</code><code class="java 
string">"dummy-id"</code><code class="java plain">);</code></div><div 
class="line number27 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">translator.setSecret(</code><code class="java 
string">"dummy-secret"</code><code class="java plain">);</code></div><div 
class="line number28 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number29 index6 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">translator.translate(text, </code><code class="java 
string">"fr"</code><code class="java plain">);</code></div><div class=
 "line number30 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code 
class="java keyword">catch</code> <code class="java plain">(Exception e) 
{</code></div><div class="line number31 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java string">"Error while 
translating."</code><code class="java plain">;</code></div><div class="line 
number32 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number33 index10 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>In order to use the Microsoft Translation API, you need to sign up for a 
Microsoft account, get an API key, then pass the key to Tika before 
translating.</p><div id="highlighter_950229" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
microsoftTranslateToFrench(String text) {</code></div><div class="line number24 
index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">MicrosoftTranslator translator = </code><code class="java 
keyword">new</code> <code class="java 
plain">MicrosoftTranslator();</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java comments">// Change the id and secret! See <a 
href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.";>http://msdn.microso
 ft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line number26 
index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">translator.setId(</code><code class="java 
string">"dummy-id"</code><code class="java plain">);</code></div><div 
class="line number27 index4 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">translator.setSecret(</code><code class="java 
string">"dummy-secret"</code><code class="java plain">);</code></div><div 
class="line number28 index5 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> 
<code class="java plain">{</code></div><div class="line number29 index6 
alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">translator.translate(text, </code><code class="java 
string">"fr"</code><code class="java plain">);</code></div><div class=
 "line number30 index7 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code 
class="java keyword">catch</code> <code class="java plain">(Exception e) 
{</code></div><div class="line number31 index8 alt2"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java string">"Error while 
translating."</code><code class="java plain">;</code></div><div class="line 
number32 index9 alt1"><code class="java 
spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java 
plain">}</code></div><div class="line number33 index10 alt2"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Language_Identification">Language Identification</a></h3>
-<p>Tika provides support for identifying the language of text, through the <a 
href="./apidocs/org/apache/tika/language/LanguageIdentifier.html">LanguageIdentifier</a>
 class.</p><div id="highlighter_131068" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
identifyLanguage(String text) {</code></div><div class="line number24 index1 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">LanguageIdentifier identifier = </code><code class="java 
keyword">new</code> <code class="java 
plain">LanguageIdentifier(text);</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">identifier.getLanguage();</code></div><div class="line number26 index3
  alt1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>Tika provides support for identifying the language of text, through the <a 
href="./apidocs/org/apache/tika/language/LanguageIdentifier.html">LanguageIdentifier</a>
 class.</p><div id="highlighter_735746" class="syntaxhighlighter nogutter  
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td 
class="code"><div class="container"><div class="line number23 index0 
alt2"><code class="java keyword">public</code> <code class="java plain">String 
identifyLanguage(String text) {</code></div><div class="line number24 index1 
alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java plain">LanguageIdentifier identifier = </code><code class="java 
keyword">new</code> <code class="java 
plain">LanguageIdentifier(text);</code></div><div class="line number25 index2 
alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code 
class="java keyword">return</code> <code class="java 
plain">identifier.getLanguage();</code></div><div class="line number26 index3
  alt1"><code class="java 
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
       </div>
       <div id="sidebar">
         <div id="navigation">

Modified: tika/site/publish/contribute.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/contribute.html?rev=1673240&r1=1673239&r2=1673240&view=diff
==============================================================================
--- tika/site/publish/contribute.html (original)
+++ tika/site/publish/contribute.html Mon Apr 13 16:57:57 2015
@@ -99,7 +99,7 @@
 <p>If you're new to reporting problems, you might find the <a 
class="externalLink" 
href="http://www.chiark.greenend.org.uk/~sgtatham/bugs.html";>How to Report Bugs 
Effectively</a> essay (amongst many others) useful for learning more about what 
makes an effective and helpful bug report.</p></div>
 <div class="section">
 <h2>New Parsers, Detectors and Mime Types<a 
name="New_Parsers_Detectors_and_Mime_Types"></a></h2>
-<p>The <a href="./parser_guide.apt">Parser Quick Start Guide</a> provides 
instructions on adding new mime types and new parsers to Tika.</p>
+<p>The <a href="./${project.parent.version">/parser_guide.html}Parser Quick 
Start Guide</a> provides instructions on adding new mime types and new parsers 
to Tika.</p>
 <p>If your new Parser or Detector depends on libraries which we cannot include 
in Tika for license reasons, you are encouraged to list it on the <a 
class="externalLink" 
href="http://wiki.apache.org/tika/3rd%20party%20parser%20plugins";>3rd Party 
Parser Plugins</a> page on the Tika wiki.</p></div>
 <div class="section">
 <h2>Submitting Enhancements and Fixes<a 
name="Submitting_Enhancements_and_Fixes"></a></h2>

Modified: tika/site/publish/plugin-management.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/plugin-management.html?rev=1673240&r1=1673239&r2=1673240&view=diff
==============================================================================
--- tika/site/publish/plugin-management.html (original)
+++ tika/site/publish/plugin-management.html Mon Apr 13 16:57:57 2015
@@ -118,7 +118,7 @@
 <tr class="b">
 <td>org.apache.maven.plugins</td>
 <td><a class="externalLink" 
href="http://maven.apache.org/plugins/maven-dependency-plugin/";>maven-dependency-plugin</a></td>
-<td>2.8</td></tr>
+<td>2.1</td></tr>
 <tr class="a">
 <td>org.apache.maven.plugins</td>
 <td><a class="externalLink" 
href="http://maven.apache.org/plugins/maven-deploy-plugin/";>maven-deploy-plugin</a></td>

Added: tika/site/src/site/apt/contribute.apt.vm
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/contribute.apt.vm?rev=1673240&view=auto
==============================================================================
--- tika/site/src/site/apt/contribute.apt.vm (added)
+++ tika/site/src/site/apt/contribute.apt.vm Mon Apr 13 16:57:57 2015
@@ -0,0 +1,151 @@
+                       ----------
+                       Contribute
+                       ----------
+
+~~ Licensed to the Apache Software Foundation (ASF) under one or more
+~~ contributor license agreements.  See the NOTICE file distributed with
+~~ this work for additional information regarding copyright ownership.
+~~ The ASF licenses this file to You under the Apache License, Version 2.0
+~~ (the "License"); you may not use this file except in compliance with
+~~ the License.  You may obtain a copy of the License at
+~~
+~~     http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License.
+
+Contribute to Apache Tika
+
+   Apache Tika is an Open Source project built and maintained by a diverse 
+   range of contributors. We welcome contributions of all types
+   to the project - code, documentation, testing, bug triage, user support, 
+   and more! Send an email to the {{{./mail-lists.html}Tika development list}}
+   if you're looking for somewhere to help.
+
+Source Code
+
+   To download the source code for the latest release of Apache Tika, please
+   see the {{{./download.html}Download page}}.
+
+   The master copy of the Apache Tika source code is held in SVN. You can
+   checkout the code from 
+   
{{{https://svn.apache.org/repos/asf/tika/trunk}https://svn.apache.org/repos/asf/tika/trunk}}
+   and you can browse it online through
+   {{{http://svn.apache.org/viewvc/tika/trunk/}Viewvc}}
+
+   For those who prefer working with Git, a read only mirror is available
+   from {{{http://git.apache.org/}git.apache.org}}. We also maintain a 
+   {{{https://github.com/apache/tika/}GitHub mirror}}, which you are welcome
+   to fork from and open pull requests to.
+
+Reporting Issues
+
+   Tika uses the {{{https://issues.apache.org/jira/browse/TIKA}ASF JIRA 
instance}}, 
+   for issue tracking, under the 
{{{https://issues.apache.org/jira/browse/TIKA}Tika Project}}.
+
+   When reporting an issue, please try to include the details, steps and
+   documents required to reproduce it. If there are multiple documents that
+   trigger the issue, a small file we can use in unit testing would be great. A
+   JUnit unit test showing the problem can be helpful, but isn't required.
+
+   If you're new to reporting problems, you might find the 
+   {{{http://www.chiark.greenend.org.uk/~sgtatham/bugs.html}How to Report
+   Bugs Effectively}}
+   essay (amongst many others) useful for learning more about what makes
+   an effective and helpful bug report.
+
+New Parsers, Detectors and Mime Types
+
+   The {{{./${project.parent.version}/parser_guide.html}Parser Quick Start 
Guide}} provides instructions
+   on adding new mime types and new parsers to Tika.
+
+   If your new Parser or Detector depends on libraries which we cannot
+   include in Tika for license reasons, you are encouraged to list it on
+   the
+   {{{http://wiki.apache.org/tika/3rd%20party%20parser%20plugins}3rd Party
+   Parser Plugins}} page on the Tika wiki.
+
+Submitting Enhancements and Fixes
+
+   All enhancements and fixes should have a 
+   {{{https://issues.apache.org/jira/browse/TIKA}JIRA Issue or Enhancement}}
+   opened for them. This should describe the problem and the proposed fix
+   / new code. The JIRA can be used for discussions on the code, and provides
+   a single identifier for the change.
+
+   SVN - For users of SVN, you can use <<<svn diff>>> to generate a patch
+   file of your changes, which can then be attached to the issue. Note that
+   a SVN diff won't normally include new or binary files, so these will need
+   to be attached separately.
+
+   Git - Git users can run <<<git diff --no-prefix>>> to generate an SVN
+   compatible patch which can then be attached to an issue.
+
+   Github Pulls - If you are working from our 
+   {{{https://github.com/apache/tika/}GitHub mirror}}, it is possible to
+   open a pull request for your change. Please include the JIRA Issue number
+   in the pull request, so it can be linked by the ASF GitHub bot. 
+
+   ReviewBoard - If you have a Work-In-Progress patch for which you would 
+   like feedback / review / assistance, you can use the 
+   {{{https://reviews.apache.org/dashboard/}Apache ReviewBoard Instance}} to
+   post your code. Please reference the JIRA Issue number from the review 
+   request, and add a link to it to the JIRA Issue.
+
+   Unit tests, License Headers - Wherever possible, we like new functionality
+   and fixes to include small-ish unit tests. Whenever you make changes,
+   please re-run the unit test suite (<<<mvn install>>> is one way to trigger
+   this), and ensure your changes don't break anything. If adding new files,
+   please include the Apache License v2 license header at the top of the
+   file.
+
+Dependencies
+
+   Any new dependencies introduced must be under a suitable license. Broadly,
+   they must be Open Source, and must not place restrictions on larger works
+   they are incorporated within. A list of the allowed licenses is maintained
+   by the {{{http://www.apache.org/legal/resolved.html}ASF Legal Affairs
+   Committee}}. If in doubt, check on the dev list.
+
+   All new and updated dependencies must be in Maven Central. (It is not
+   possible for Apache releases to depend on additional repositories in
+   their poms). If possible, the project producing the dependency should
+   be asked to publish it to Central, such as through the
+   
{{{https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide}Sonatype
 OSS Maven Repo}}.
+   If that isn't possible, someone will need to upload it via the
+   
{{{https://docs.sonatype.org/display/Repository/Uploading+3rd-party+Artifacts+to+The+Central+Repository}Sonatype
 3rd Party OSS Artifacts process}}.
+   This will need to be completed before any patches depending on the
+   new library can be committed to Tika.
+
+Code Formatting
+
+   Java code should be indented with 4 spaces, no tabs. Opening brackets
+   should normally be on the same line as the statement. Java coding 
+   standards are normally followed, but if in doubt follow what the
+   existing code does!
+
+   Imports should normally be explicit, wildcard (foo.*) imports should
+   not normally be used. The imports should be ordered by javax, then
+   java, then other.
+
+   From time to time, you may find that code you are working on doesn't
+   follow these rules. If you find that, please don't submit a single
+   patch with logic changes + formatting together, as those are very hard
+   to review. Instead, please submit two patches, one to correct formatting
+   problems, and a second for your logic changes / fixes.
+
+Other Resources
+
+   * The {{{http://community.apache.org/}Apache Community Development
+     project (ComDev)}} provide general advice on getting started with
+     contributing to Apache projects
+
+   * The Apache Nutch project provide a comprehensive guide on
+     {{{http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer}becoming a
+     Nutch Devloper}}, much of which applies equally for Apache Tika too
+
+   * The book {{{http://manning.com/mattmann/}Tika in Action}} has a lot
+     of great information on how Tika works, and how to extend it


Reply via email to