formats.apt

nick Mon, 04 May 2015 23:18:07 -0700

Author: nick
Date: Tue May  5 06:17:55 2015
New Revision: 1677745

URL: http://svn.apache.org/r1677745
Log:
List parsers which are new in 1.9, along with fixing a few older entries 
spotted at the same time


Modified:
    tika/site/src/site/apt/1.9/formats.apt

Modified: tika/site/src/site/apt/1.9/formats.apt
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/1.9/formats.apt?rev=1677745&r1=1677744&r2=1677745&view=diff
==============================================================================
--- tika/site/src/site/apt/1.9/formats.apt (original)
+++ tika/site/src/site/apt/1.9/formats.apt Tue May  5 06:17:55 2015
@@ -59,6 +59,9 @@ Supported Document Formats
    classes use {{{http://poi.apache.org/}Apache POI}} libraries to support
    text and metadata extraction from both OLE2 and OOXML documents.
 
+   Old, pre-OLE2 Excel files (Excel 2, 3 and 4) are handled by the
+   
{{{./api/org/apache/tika/parser/microsoft/OldExcelParser.html}OldExcelParser}}.
+
 * {OpenDocument Format}
 
    The OpenDocument format (ODF) is used most notably as the default format
@@ -99,11 +102,13 @@ Supported Document Formats
 
    Tika uses the {{{http://commons.apache.org/compress/}Commons Compress}}
    library to support various compression and packaging formats. The
+   {{{./api/org/apache/tika/parser/pkg/CompressorParser.html}CompressorParser}}
+   class handles parsing of the top level compression formats, then
    {{{./api/org/apache/tika/parser/pkg/PackageParser.html}PackageParser}}
-   class and its subclasses first parse the top level compression or
-   packaging format and then pass the unpacked document streams to a
-   second parsing stage using the parser instance specified in the
-   parse context. Formats supported include Tar, RAR, CPIO, Zip and 7Zip.
+   class and its subclasses parse the packaging formats and then pass the 
+   unpacked document streams to a second parsing stage using the parser 
+   instance specified in the parse context. Formats supported include Tar, 
+   RAR, AR, CPIO, Zip, 7Zip, Gzip, BZip2, XZ and Pack200.
 
 * {Text formats}
 
@@ -161,6 +166,8 @@ Supported Document Formats
    extracts metadata from PSD images. The
    {{{./api/org/apache/tika/parser/image/BPGParser.html}BPGParser}} class
    extracts simple metadata from BPG (Better Portable Graphics) images.
+   The {{{./api/org/apache/tika/parser/image/WebPParser.html}WebPParser}} 
+   class extracts simple metadata from WebP image format.
 
    When extracting from images, it is also possible to chain in Tesseract via
    the 
{{{./api/org/apache/tika/parser/ocr/TesseractOCRParser.html}TesseractOCRParser}}
@@ -170,7 +177,7 @@ Supported Document Formats
 
    Tika supports the Flash video format using a simple parsing algorithm 
    implemented in the
-   {{{./api/org/apache/tika/parser/flv/FLVParser}FLVParser}} class.
+   {{{./api/org/apache/tika/parser/video/FLVParser}FLVParser}} class.
 
    The MP4 family of video formats (MP4, Quicktime, 3GPP etc) is supported 
    by the {{{./api/org/apache/tika/parser/mp4/MP4Parser}MP4Parser}} class,
@@ -204,9 +211,13 @@ Supported Document Formats
    process single email messages in the RFC 822 format used by many email 
clients
    in their archives / exports.
 
-   The {{{./api/org/apache/tika/parser/mbox/PSTParser.html}PSDParser}} can
+   The 
{{{./api/org/apache/tika/parser/mbox/OutlookPSTParser.html}OutlookPSTParser}} 
can
    extract email messages from the Microsoft Outlook PST email format.
 
+   The {{{./api/org/apache/tika/parser/microsoft/TNEFParser.html}TNEFParser} 
can
+   extract email attachments from the Microsoft TNEF (Transport Neutral 
Encoding
+   Format, aka Winmail.dat) used with some Microsoft email clients.
+
 * {CAD formats}
 
    The {{{./api/org/apache/tika/parser/dwg/DWGParser.html}DWGParser}} can
@@ -221,21 +232,33 @@ Supported Document Formats
 
 * {Scientific formats}
 
+   The {{{./api/org/apache/tika/parser/dif/DIFParser.html}DIFParser}}
+   is able to extract attribute metadata from the GCMD Directory 
+   Interchange Format (DIF) scientific file format.
+
+   The {{{./api/org/apache/tika/parser/gdal/GDALParser.html}GDALParser}}
+   is able to extract attribute metadata from the GDAL scientific file format.
+
+   The 
{{{./api/org/apache/tika/parser/geoinfo/GeographicInformationParser.html}GeographicInformationParser}}
+   is able to extract attribute metadata from the ISO-19139 georgraphic 
+   information file format.
+
+   The {{{./api/org/apache/tika/parser/grib/GribParser.html}GribParser}}
+   is able to extract attribute metadata from the Grib scientific file format.
+
    The {{{./api/org/apache/tika/parser/hdf/HDFParser.html}HDFParser}}
    is able to extract attribute metadata from the HDF scientific file format.
 
+   The 
{{{./api/org/apache/tika/parser/isatab/ISArchiveParser.html}ISArchiveParser}
+   is able to extract attribute metadata from the ISA-Tab (ISA Tools) family of
+   scientific file formats.
+
    The {{{./api/org/apache/tika/parser/netcdf/NetCDFParser.html}NetCDFParser}}
    is able to extract attribute metadata from the NetCDF scientific file 
format.
 
    The {{{./api/org/apache/tika/parser/mat/MatParser.html}MatParser}}
    is able to extract attribute metadata from the Matlab scientific file 
format.
 
-   The {{{./api/org/apache/tika/parser/gdal/GDALParser.html}GDALParser}}
-   is able to extract attribute metadata from the GDAL scientific file format.
-
-   The {{{./api/org/apache/tika/parser/grib/GribParser.html}GribParser}}
-   is able to extract attribute metadata from the Grib scientific file format.
-
 * {Executable programs and libraries}
 
    The 
{{{./api/org/apache/tika/parser/executable/ExecutableParser.html}ExecutableParser}}
 can

svn commit: r1677745 - /tika/site/src/site/apt/1.9/formats.apt

Reply via email to