formats.apt

nick Sat, 01 Aug 2015 14:07:17 -0700

Author: nick
Date: Sat Aug  1 21:07:08 2015
New Revision: 1693764

URL: http://svn.apache.org/r1693764
Log:
List more supported parsers


Modified:
    tika/site/publish/1.10/formats.html
    tika/site/src/site/apt/1.10/formats.apt

Modified: tika/site/publish/1.10/formats.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/1.10/formats.html?rev=1693764&r1=1693763&r2=1693764&view=diff
==============================================================================
--- tika/site/publish/1.10/formats.html (original)
+++ tika/site/publish/1.10/formats.html Sat Aug  1 21:07:08 2015
@@ -113,7 +113,8 @@
 <li><a href="#Font_formats">Font formats</a></li>
 <li><a href="#Scientific_formats">Scientific formats</a></li>
 <li><a href="#Executable_programs_and_libraries">Executable programs and 
libraries</a></li>
-<li><a href="#Crypto_formats">Crypto formats</a></li></ul></li></ul>
+<li><a href="#Crypto_formats">Crypto formats</a></li>
+<li><a href="#Database_formats">Database formats</a></li></ul></li></ul>
 <div class="section">
 <h3><a name="HyperText_Markup_Language">HyperText Markup Language</a></h3>
 <p>The HyperText Markup Language (HTML) is the lingua franca of the web. Tika 
uses the <a class="externalLink" 
href="http://home.ccil.org/~cowan/XML/tagsoup/";>TagSoup</a> library to support 
virtually any kind of HTML found on the web. The output from the <a 
href="./api/org/apache/tika/parser/html/HtmlParser.html">HtmlParser</a> class 
is guaranteed to be well-formed and valid XHTML, and various heuristics are 
used to prevent things like inline scripts from cluttering the extracted text 
content.</p></div>
@@ -200,7 +201,11 @@
 <p>The <a 
href="./api/org/apache/tika/parser/executable/ExecutableParser.html">ExecutableParser</a>
 can extract metadata information on platforms, architectures and types from a 
range of executable formats and libraries, such as Windows Executables and 
Linux / BSD programs and libraries.</p></div>
 <div class="section">
 <h3><a name="Crypto_formats">Crypto formats</a></h3>
-<p>The <a 
href="./api/org/apache/tika/parser/crypto/Pkcs7Parser.html">Pkcs7Parser</a> is 
able to parse the contents of PKCS7 signed messages, but doesn't include any 
information from the outer PKCS7 wrapper.</p></div></div>
+<p>The <a 
href="./api/org/apache/tika/parser/crypto/Pkcs7Parser.html">Pkcs7Parser</a> is 
able to parse the contents of PKCS7 signed messages, but doesn't include any 
information from the outer PKCS7 wrapper.</p></div>
+<div class="section">
+<h3><a name="Database_formats">Database formats</a></h3>
+<p>The <a 
href="./api/org/apache/tika/parser/jdbc/SQLite3Parser.html">SQLite3Parser</a> 
is able to extract content from SQLite3 files, in a tabular form. However, it 
requires that the <a href="#org.xerial_sqlite-jdbc_jar"></a> is manually added 
to the classpath first, as that binary jar isn't shipped as standard.</p>
+<p>The <a 
href="./api/org/apache/tika/parser/microsoft/JackcessParser.html">JackcessParser</a>
 is able to extract metadata and content in a tabular form, from Microsoft 
Access database files.</p></div></div>
 <div class="section">
 <h2>Full list of supported formats:<a 
name="Full_list_of_supported_formats:"></a></h2>
 <p>TODO Populate this at release time</p></div>

Modified: tika/site/src/site/apt/1.10/formats.apt
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/1.10/formats.apt?rev=1693764&r1=1693763&r2=1693764&view=diff
==============================================================================
--- tika/site/src/site/apt/1.10/formats.apt (original)
+++ tika/site/src/site/apt/1.10/formats.apt Sat Aug  1 21:07:08 2015
@@ -286,6 +286,17 @@ Supported Document Formats
    parse the contents of PKCS7 signed messages, but doesn't include any 
information from
    the outer PKCS7 wrapper.
 
+* {Database formats}
+
+   The {{{./api/org/apache/tika/parser/jdbc/SQLite3Parser.html}SQLite3Parser}} 
is able to
+   extract content from SQLite3 files, in a tabular form. However, it requires 
that the
+   {{{org.xerial sqlite-jdbc jar}}} is manually added to the classpath first, 
as that
+   binary jar isn't shipped as standard.
+
+   The 
{{{./api/org/apache/tika/parser/microsoft/JackcessParser.html}JackcessParser}} 
is 
+   able to extract metadata and content in a tabular form, from Microsoft 
Access 
+   database files.
+
 Full list of supported formats:
 
    TODO Populate this at release time

svn commit: r1693764 - in /tika/site: publish/1.10/formats.html src/site/apt/1.10/formats.apt

Reply via email to