Hello!
05.09.2011, в 16:23, Jukka Zitting написал(а):
> That was me in revision 1164578 for TIKA-704. :-(
>
>> - if (root.hasEntry("CONTENTS")) {
>> - stream = TikaInputStream.get(
>> - fs.createDocumentInputStream("CONTENTS"));
>
> This was my attempt at properly handling the embedded PDF in
> TestWithPdf.docx. It was included in an OLE object with the PDF
> document as it's "CONTENTS" entry. I restored this functionality with
> some more specific checks in revision 1165259, and the resulting code
> should now work correctly with all the test documents we have.
Hm, that is strange - current version of
OfficeParser.POIFSDocumentType.detectType() thinks that "CONTENTS" part
identifies POI filesystem as MS Works document. Maybe this is not right.
Please add unit test with that TestWithPdf.docx.
best wishes, Max