[Bug 61267] Extract text from Microsoft Word 2.0 (pre-OLE2) document

bugzilla Sun, 09 Jul 2017 19:19:04 -0700

https://bz.apache.org/bugzilla/show_bug.cgi?id=61267


Javen O'Neal <one...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Meta data of attached word  |Extract text from Microsoft
                   |file gets parsed. However,  |Word 2.0 (pre-OLE2)
                   |content of file is not      |document
                   |parsed and is blank         |
           Severity|major                       |enhancement

--- Comment #3 from Javen O'Neal <one...@apache.org> ---
There are several entry points into POI. We should figure out what class should
be responsible for checking the first few bytes (magic number) of a file to
figure out what file format it is (Tika style).

We could continue adding known magic numbers to o.a.p.poifs.HeaderBlock, but we
may want to reuse that code elsewhere, such as
WorkbookFactory/DocumentFactory/SlideshowFactory, the Extractor classes for
Tika, etc.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

[Bug 61267] Extract text from Microsoft Word 2.0 (pre-OLE2) document

Reply via email to