Author: nick
Date: Sat Aug 1 21:07:08 2015
New Revision: 1693764
URL: http://svn.apache.org/r1693764
Log:
List more supported parsers
Modified:
tika/site/publish/1.10/formats.html
tika/site/src/site/apt/1.10/formats.apt
Modified: tika/site/publish/1.10/formats.html
URL:
http://svn.apache.org/viewvc/tika/site/publish/1.10/formats.html?rev=1693764&r1=1693763&r2=1693764&view=diff
==============================================================================
--- tika/site/publish/1.10/formats.html (original)
+++ tika/site/publish/1.10/formats.html Sat Aug 1 21:07:08 2015
@@ -113,7 +113,8 @@
<li><a href="#Font_formats">Font formats</a></li>
<li><a href="#Scientific_formats">Scientific formats</a></li>
<li><a href="#Executable_programs_and_libraries">Executable programs and
libraries</a></li>
-<li><a href="#Crypto_formats">Crypto formats</a></li></ul></li></ul>
+<li><a href="#Crypto_formats">Crypto formats</a></li>
+<li><a href="#Database_formats">Database formats</a></li></ul></li></ul>
<div class="section">
<h3><a name="HyperText_Markup_Language">HyperText Markup Language</a></h3>
<p>The HyperText Markup Language (HTML) is the lingua franca of the web. Tika
uses the <a class="externalLink"
href="http://home.ccil.org/~cowan/XML/tagsoup/">TagSoup</a> library to support
virtually any kind of HTML found on the web. The output from the <a
href="./api/org/apache/tika/parser/html/HtmlParser.html">HtmlParser</a> class
is guaranteed to be well-formed and valid XHTML, and various heuristics are
used to prevent things like inline scripts from cluttering the extracted text
content.</p></div>
@@ -200,7 +201,11 @@
<p>The <a
href="./api/org/apache/tika/parser/executable/ExecutableParser.html">ExecutableParser</a>
can extract metadata information on platforms, architectures and types from a
range of executable formats and libraries, such as Windows Executables and
Linux / BSD programs and libraries.</p></div>
<div class="section">
<h3><a name="Crypto_formats">Crypto formats</a></h3>
-<p>The <a
href="./api/org/apache/tika/parser/crypto/Pkcs7Parser.html">Pkcs7Parser</a> is
able to parse the contents of PKCS7 signed messages, but doesn't include any
information from the outer PKCS7 wrapper.</p></div></div>
+<p>The <a
href="./api/org/apache/tika/parser/crypto/Pkcs7Parser.html">Pkcs7Parser</a> is
able to parse the contents of PKCS7 signed messages, but doesn't include any
information from the outer PKCS7 wrapper.</p></div>
+<div class="section">
+<h3><a name="Database_formats">Database formats</a></h3>
+<p>The <a
href="./api/org/apache/tika/parser/jdbc/SQLite3Parser.html">SQLite3Parser</a>
is able to extract content from SQLite3 files, in a tabular form. However, it
requires that the <a href="#org.xerial_sqlite-jdbc_jar"></a> is manually added
to the classpath first, as that binary jar isn't shipped as standard.</p>
+<p>The <a
href="./api/org/apache/tika/parser/microsoft/JackcessParser.html">JackcessParser</a>
is able to extract metadata and content in a tabular form, from Microsoft
Access database files.</p></div></div>
<div class="section">
<h2>Full list of supported formats:<a
name="Full_list_of_supported_formats:"></a></h2>
<p>TODO Populate this at release time</p></div>
Modified: tika/site/src/site/apt/1.10/formats.apt
URL:
http://svn.apache.org/viewvc/tika/site/src/site/apt/1.10/formats.apt?rev=1693764&r1=1693763&r2=1693764&view=diff
==============================================================================
--- tika/site/src/site/apt/1.10/formats.apt (original)
+++ tika/site/src/site/apt/1.10/formats.apt Sat Aug 1 21:07:08 2015
@@ -286,6 +286,17 @@ Supported Document Formats
parse the contents of PKCS7 signed messages, but doesn't include any
information from
the outer PKCS7 wrapper.
+* {Database formats}
+
+ The {{{./api/org/apache/tika/parser/jdbc/SQLite3Parser.html}SQLite3Parser}}
is able to
+ extract content from SQLite3 files, in a tabular form. However, it requires
that the
+ {{{org.xerial sqlite-jdbc jar}}} is manually added to the classpath first,
as that
+ binary jar isn't shipped as standard.
+
+ The
{{{./api/org/apache/tika/parser/microsoft/JackcessParser.html}JackcessParser}}
is
+ able to extract metadata and content in a tabular form, from Microsoft
Access
+ database files.
+
Full list of supported formats:
TODO Populate this at release time