formats.apt

nick Wed, 30 Mar 2011 04:53:23 -0700

Author: nick
Date: Wed Mar 30 11:52:57 2011
New Revision: 1086912

URL: http://svn.apache.org/viewvc?rev=1086912&view=rev
Log:
TIKA-624 - Update supported formats for 0.8 and 0.9


Modified:
    tika/site/src/site/apt/0.8/formats.apt
    tika/site/src/site/apt/0.9/formats.apt

Modified: tika/site/src/site/apt/0.8/formats.apt
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/0.8/formats.apt?rev=1086912&r1=1086911&r2=1086912&view=diff
==============================================================================
--- tika/site/src/site/apt/0.8/formats.apt (original)
+++ tika/site/src/site/apt/0.8/formats.apt Wed Mar 30 11:52:57 2011
@@ -19,7 +19,7 @@
 
 Supported Document Formats
 
-   This page lists all the document formats supported by Apache Tika 0.6.
+   This page lists all the document formats supported by Apache Tika 0.8.
    Follow the links to the various parser class javadocs for more detailed
    information about each document format and how it is parsed by Tika.
 
@@ -46,6 +46,11 @@ Supported Document Formats
    structure. The only exception to this rule are Dublin Core metadata
    elements that are used for the document metadata.
 
+   Tika also includes
+   {{{api/org/apache/tika/parser/feed/FeedParser.html}FeedParser}} which
+   is able to extract metadata and content from XML based feeds such as
+   RSS and Atom.
+
 * {Microsoft Office document formats}
 
    Microsoft Office and some related applications produce documents in the
@@ -59,6 +64,10 @@ Supported Document Formats
    classes use {{{http://poi.apache.org/}Apache POI}} libraries to support
    text and metadata extraction from both OLE2 and OOXML documents.
 
+   In addition to office documents, the 
+   {{{api/org/apache/tika/parser/microsoft/OfficeParser.html}OfficeParser}}
+   is also able to extract text and metadata from Outlook .msg emails.
+
 * {OpenDocument Format}
 
    The OpenDocument format (ODF) is used most notably as the default format
@@ -67,6 +76,12 @@ Supported Document Formats
    class supports this format and the earlier OpenOffice 1.0 format on which
    ODF is based.
 
+* {Apple iWorks Formats}
+
+   The iWorks formats of Numbers, Pages and Keynote are used by Apple's iWork
+   office suite. The 
{{{api/org/apache/tika/parser/iwork/IWorkParser.html}IWorkParser}}
+   is able to extract text and metadata from these files.
+
 * {Portable Document Format}
 
    The {{{api/org/apache/tika/parser/pdf/PDFParser.html}PDFParser}} class
@@ -121,9 +136,10 @@ Supported Document Formats
    class uses the standard javax.imageio feature to extract simple metadata
    from image formats supported by the Java platform. More complex image
    metadata is available through the
-   {{{api/org/apache/tika/parser/jpeg/JpegParser.html}JpegParser}} class
+   {{{api/org/apache/tika/parser/jpeg/JpegParser.html}JpegParser}} and
+   {{{api/org/apache/tika/parser/tiff/TiffParser.html}TiffParser}} classes
    that uses the metadata-extractor library to supports Exif metadata
-   extraction from Jpeg images.
+   extraction from Jpeg and Tiff images.
 
 * {Video formats}
 
@@ -143,3 +159,22 @@ Supported Document Formats
    The {{{api/org/apache/tika/parser/mbox/MboxParser.html}MboxParser}} can
    extract email messages from the mbox format used by many email archives
    and Unix-style mailboxes.
+
+* {The DWG (AutoCAD) format}
+
+   The {{{api/org/apache/tika/parser/dwg/DWGParser.html}DWGParser}} can
+   extract metadata (but not textual contents) from the DWG format that
+   is used by AutoCAD.
+
+* {Font formats}
+
+   The {{{api/org/apache/tika/parser/font/TrueTypeParser.html}TrueTypeParser}} 
+   can extract limited metadata from TrueType fonts.
+
+* {Scientific formats}
+
+   The {{{api/org/apache/tika/parser/hdf/HDFParser.html}HDFParser}} 
+   is able to extract attribute metadata from the HDF scientific file format.
+
+   The {{{api/org/apache/tika/parser/netcdf/NetCDFParser.html}NetCDFParser}} 
+   is able to extract attribute metadata from the NetCDF scientific file 
format.

Modified: tika/site/src/site/apt/0.9/formats.apt
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/0.9/formats.apt?rev=1086912&r1=1086911&r2=1086912&view=diff
==============================================================================
--- tika/site/src/site/apt/0.9/formats.apt (original)
+++ tika/site/src/site/apt/0.9/formats.apt Wed Mar 30 11:52:57 2011
@@ -19,7 +19,7 @@
 
 Supported Document Formats
 
-   This page lists all the document formats supported by Apache Tika 0.6.
+   This page lists all the document formats supported by Apache Tika 0.9.
    Follow the links to the various parser class javadocs for more detailed
    information about each document format and how it is parsed by Tika.
 
@@ -46,6 +46,11 @@ Supported Document Formats
    structure. The only exception to this rule are Dublin Core metadata
    elements that are used for the document metadata.
 
+   Tika also includes
+   {{{api/org/apache/tika/parser/feed/FeedParser.html}FeedParser}} which
+   is able to extract metadata and content from XML based feeds such as
+   RSS and Atom.
+
 * {Microsoft Office document formats}
 
    Microsoft Office and some related applications produce documents in the
@@ -67,6 +72,12 @@ Supported Document Formats
    class supports this format and the earlier OpenOffice 1.0 format on which
    ODF is based.
 
+* {Apple iWorks Formats}
+
+   The iWorks formats of Numbers, Pages and Keynote are used by Apple's iWork
+   office suite. The 
{{{api/org/apache/tika/parser/iwork/IWorkParser.html}IWorkParser}}
+   is able to extract text and metadata from these files.
+
 * {Portable Document Format}
 
    The {{{api/org/apache/tika/parser/pdf/PDFParser.html}PDFParser}} class
@@ -121,9 +132,10 @@ Supported Document Formats
    class uses the standard javax.imageio feature to extract simple metadata
    from image formats supported by the Java platform. More complex image
    metadata is available through the
-   {{{api/org/apache/tika/parser/jpeg/JpegParser.html}JpegParser}} class
+   {{{api/org/apache/tika/parser/jpeg/JpegParser.html}JpegParser}} and
+   {{{api/org/apache/tika/parser/tiff/TiffParser.html}TiffParser}} classes
    that uses the metadata-extractor library to supports Exif metadata
-   extraction from Jpeg images.
+   extraction from Jpeg and Tiff images.
 
 * {Video formats}
 
@@ -138,8 +150,34 @@ Supported Document Formats
    the {{{api/org/apache/tika/parser/pkg/ZipParser.html}ZipParser}} class
    supports also jar archives.
 
-* {The mbox format}
+* {Mail formats}
 
    The {{{api/org/apache/tika/parser/mbox/MboxParser.html}MboxParser}} can
    extract email messages from the mbox format used by many email archives
    and Unix-style mailboxes.
+
+   The {{{api/org/apache/tika/parser/rfc822/RFC822Parser.html}RFC822Parser}} 
can
+   extract email messages from the RFC822 format of email messages. 
+
+   In addition to office documents, the 
+   {{{api/org/apache/tika/parser/microsoft/OfficeParser.html}OfficeParser}}
+   is also able to extract text and metadata from Outlook .msg emails.
+
+* {The DWG (AutoCAD) format}
+
+   The {{{api/org/apache/tika/parser/dwg/DWGParser.html}DWGParser}} can
+   extract metadata (but not textual contents) from the DWG format that
+   is used by AutoCAD.
+
+* {Font formats}
+
+   The {{{api/org/apache/tika/parser/font/TrueTypeParser.html}TrueTypeParser}} 
+   can extract limited metadata from TrueType fonts.
+
+* {Scientific formats}
+
+   The {{{api/org/apache/tika/parser/hdf/HDFParser.html}HDFParser}} 
+   is able to extract attribute metadata from the HDF scientific file format.
+
+   The {{{api/org/apache/tika/parser/netcdf/NetCDFParser.html}NetCDFParser}} 
+   is able to extract attribute metadata from the NetCDF scientific file 
format.

svn commit: r1086912 - in /tika/site/src/site/apt: 0.8/formats.apt 0.9/formats.apt

Reply via email to