Author: nick
Date: Fri Dec 19 03:37:22 2014
New Revision: 1646621
URL: http://svn.apache.org/r1646621
Log:
List some more new parsers
Modified:
tika/site/publish/1.7/examples.html
tika/site/publish/1.7/formats.html
tika/site/src/site/apt/1.7/formats.apt
Modified: tika/site/publish/1.7/examples.html
URL:
http://svn.apache.org/viewvc/tika/site/publish/1.7/examples.html?rev=1646621&r1=1646620&r2=1646621&view=diff
==============================================================================
--- tika/site/publish/1.7/examples.html (original)
+++ tika/site/publish/1.7/examples.html Fri Dec 19 03:37:22 2014
@@ -102,7 +102,7 @@
<p>TODO Explain about using this</p><style type="text/css">
@import url('attached-includes/css/shCoreDefault.css');
</style>
-<div id="highlighter_593455" class="syntaxhighlighter nogutter java"><table
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div
class="container"><div class="line number37 index0 alt2"><code class="java
keyword">public</code> <code class="java plain">String parseToStringExample()
</code><code class="java keyword">throws</code> <code class="java
plain">IOException, SAXException, TikaException {</code></div><div class="line
number38 index1 alt1"><code class="java
spaces"> </code><code class="java plain">InputStream
stream = ParsingExample.</code><code class="java keyword">class</code><code
class="java plain">.getResourceAsStream(</code><code class="java
string">"test.doc"</code><code class="java plain">);</code></div><div
class="line number39 index2 alt2"><code class="java
spaces"> </code><code class="java plain">Tika tika =
</code><code class="java keyword">new</code> <code class="java
plain">Tika();</code></div><
div class="line number40 index3 alt1"><code class="java
spaces"> </code><code class="java keyword">try</code>
<code class="java plain">{</code></div><div class="line number41 index4
alt2"><code class="java
spaces"> </code><code
class="java keyword">return</code> <code class="java
plain">tika.parseToString(stream);</code></div><div class="line number42 index5
alt1"><code class="java spaces"> </code><code
class="java plain">} </code><code class="java keyword">finally</code> <code
class="java plain">{</code></div><div class="line number43 index6 alt2"><code
class="java
spaces"> </code><code
class="java plain">stream.close();</code></div><div class="line number44 index7
alt1"><code class="java spaces"> </code><code
class="java plain">}</code></div><div class="line number45 index8 alt2"><code
class="java plain">}</code></div></
div></td></tr></tbody></table></div></div></div></div>
+<div id="highlighter_260411" class="syntaxhighlighter nogutter java"><table
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div
class="container"><div class="line number37 index0 alt2"><code class="java
keyword">public</code> <code class="java plain">String parseToStringExample()
</code><code class="java keyword">throws</code> <code class="java
plain">IOException, SAXException, TikaException {</code></div><div class="line
number38 index1 alt1"><code class="java
spaces"> </code><code class="java plain">InputStream
stream = ParsingExample.</code><code class="java keyword">class</code><code
class="java plain">.getResourceAsStream(</code><code class="java
string">"test.doc"</code><code class="java plain">);</code></div><div
class="line number39 index2 alt2"><code class="java
spaces"> </code><code class="java plain">Tika tika =
</code><code class="java keyword">new</code> <code class="java
plain">Tika();</code></div><
div class="line number40 index3 alt1"><code class="java
spaces"> </code><code class="java keyword">try</code>
<code class="java plain">{</code></div><div class="line number41 index4
alt2"><code class="java
spaces"> </code><code
class="java keyword">return</code> <code class="java
plain">tika.parseToString(stream);</code></div><div class="line number42 index5
alt1"><code class="java spaces"> </code><code
class="java plain">} </code><code class="java keyword">finally</code> <code
class="java plain">{</code></div><div class="line number43 index6 alt2"><code
class="java
spaces"> </code><code
class="java plain">stream.close();</code></div><div class="line number44 index7
alt1"><code class="java spaces"> </code><code
class="java plain">}</code></div><div class="line number45 index8 alt2"><code
class="java plain">}</code></div></
div></td></tr></tbody></table></div></div></div></div>
</div>
<div id="sidebar">
<div id="navigation">
Modified: tika/site/publish/1.7/formats.html
URL:
http://svn.apache.org/viewvc/tika/site/publish/1.7/formats.html?rev=1646621&r1=1646620&r2=1646621&view=diff
==============================================================================
--- tika/site/publish/1.7/formats.html (original)
+++ tika/site/publish/1.7/formats.html Fri Dec 19 03:37:22 2014
@@ -156,7 +156,8 @@
<p>Tika can detect several common audio formats and extract metadata from
them. Even text extraction is supported for some audio files that contain
lyrics or other textual content. Extracted metadata includes sampling rates,
channels, format information, artists, titles etc. The <a
href="./api/org/apache/tika/parser/audio/AudioParser.html">AudioParser</a> and
<a href="./api/org/apache/tika/parser/audio/MidiParser.html">MidiParser</a>
classes use standard javax.sound features to process simple audio formats. The
<a href="./api/org/apache/tika/parser/mp3/Mp3Parser.html">Mp3Parser</a> class
adds support for the widely used MP3 format, and the <a
href="./api/org/apache/tika/parser/mp4/MP4Parser.html">MP4Parser</a> class
provides it for MP4 audio. The Ogg family of audio formats (Vorbis, Speex,
Opus, Flac etc) are supported by the <a
href="./api/org/gagravarr/tika/VorbisParser.html">VorbisParser</a>, <a
href="./api/org/gagravarr/tika/OpusParser.html">OpusParser</a>, <a
href="./api/org/ga
gravarr/tika/SpeexParser.html">SpeexParser</a> and <a
href="./api/org/gagravarr/tika/FlacParser.html">FlacParser</a>
classes.</p></div>
<div class="section">
<h3><a name="Image_formats">Image formats</a></h3>
-<p>The <a
href="./api/org/apache/tika/parser/image/ImageParser.html">ImageParser</a>
class uses the standard javax.imageio feature to extract simple metadata from
image formats supported by the Java platform, such as PNG, GIF and BMP. More
complex image metadata is available through the <a
href="./api/org/apache/tika/parser/jpeg/JpegParser.html">JpegParser</a> class
and <a href="./api/org/apache/tika/parser/image/TiffParser.html">TiffParser</a>
classes that uses the metadata-extractor library to supports Exif metadata
extraction from Jpeg and Tiff images. The <a
href="./api/org/apache/tika/parser/image/PSDParser.html">PSDParser</a> class
extracts metadata from PSD images.</p></div>
+<p>The <a
href="./api/org/apache/tika/parser/image/ImageParser.html">ImageParser</a>
class uses the standard javax.imageio feature to extract simple metadata from
image formats supported by the Java platform, such as PNG, GIF and BMP. More
complex image metadata is available through the <a
href="./api/org/apache/tika/parser/jpeg/JpegParser.html">JpegParser</a> class
and <a href="./api/org/apache/tika/parser/image/TiffParser.html">TiffParser</a>
classes that uses the metadata-extractor library to supports Exif metadata
extraction from Jpeg and Tiff images. The <a
href="./api/org/apache/tika/parser/image/PSDParser.html">PSDParser</a> class
extracts metadata from PSD images. The <a
href="./api/org/apache/tika/parser/image/BPGParser.html">BPGParser</a> class
extracts simple metadata from BPG (Better Portable Graphics) images.</p>
+<p>When extracting from images, it is also possible to chain in Tesseract via
the <a
href="./api/org/apache/tika/parser/ocr/TesseractOCRParser.html">TesseractOCRParser</a>
to have OCR performed on the contents of the image.</p></div>
<div class="section">
<h3><a name="Video_formats">Video formats</a></h3>
<p>Tika supports the Flash video format using a simple parsing algorithm
implemented in the <a
href="./api/org/apache/tika/parser/flv/FLVParser">FLVParser</a> class.</p>
@@ -183,7 +184,8 @@
<h3><a name="Scientific_formats">Scientific formats</a></h3>
<p>The <a href="./api/org/apache/tika/parser/hdf/HDFParser.html">HDFParser</a>
is able to extract attribute metadata from the HDF scientific file format.</p>
<p>The <a
href="./api/org/apache/tika/parser/netcdf/NetCDFParser.html">NetCDFParser</a>
is able to extract attribute metadata from the NetCDF scientific file
format.</p>
-<p>The <a href="./api/org/apache/tika/parser/mat/MatParser.html">MatParser</a>
is able to extract attribute metadata from the Matlab scientific file
format.</p></div>
+<p>The <a href="./api/org/apache/tika/parser/mat/MatParser.html">MatParser</a>
is able to extract attribute metadata from the Matlab scientific file
format.</p>
+<p>The <a
href="./api/org/apache/tika/parser/gdal/GDALParser.html">GDALParser</a> is able
to extract attribute metadata from the GDAL scientific file format.</p></div>
<div class="section">
<h3><a name="Executable_programs_and_libraries">Executable programs and
libraries</a></h3>
<p>The <a
href="./api/org/apache/tika/parser/executable/ExecutableParser.html">ExecutableParser</a>
can extract metadata information on platforms, architectures and types from a
range of executable formats and libraries, such as Windows Executables and
Linux / BSD programs and libraries.</p></div>
Modified: tika/site/src/site/apt/1.7/formats.apt
URL:
http://svn.apache.org/viewvc/tika/site/src/site/apt/1.7/formats.apt?rev=1646621&r1=1646620&r2=1646621&view=diff
==============================================================================
--- tika/site/src/site/apt/1.7/formats.apt (original)
+++ tika/site/src/site/apt/1.7/formats.apt Fri Dec 19 03:37:22 2014
@@ -158,7 +158,13 @@ Supported Document Formats
that uses the metadata-extractor library to supports Exif metadata
extraction from Jpeg and Tiff images. The
{{{./api/org/apache/tika/parser/image/PSDParser.html}PSDParser}} class
- extracts metadata from PSD images.
+ extracts metadata from PSD images. The
+ {{{./api/org/apache/tika/parser/image/BPGParser.html}BPGParser}} class
+ extracts simple metadata from BPG (Better Portable Graphics) images.
+
+ When extracting from images, it is also possible to chain in Tesseract via
+ the
{{{./api/org/apache/tika/parser/ocr/TesseractOCRParser.html}TesseractOCRParser}}
+ to have OCR performed on the contents of the image.
* {Video formats}
@@ -224,6 +230,9 @@ Supported Document Formats
The {{{./api/org/apache/tika/parser/mat/MatParser.html}MatParser}}
is able to extract attribute metadata from the Matlab scientific file
format.
+ The {{{./api/org/apache/tika/parser/gdal/GDALParser.html}GDALParser}}
+ is able to extract attribute metadata from the GDAL scientific file format.
+
* {Executable programs and libraries}
The
{{{./api/org/apache/tika/parser/executable/ExecutableParser.html}ExecutableParser}}
can