https://bz.apache.org/bugzilla/show_bug.cgi?id=64164
Bug ID: 64164
Summary: (POI 3.17) - Embedded files in .doc text extracted
automatically - how to skip these
Product: POI
Version: unspecified
Hardware: PC
OS: Mac OS X 10.1
Status: NEW
Severity: normal
Priority: P2
Component: POI Overall
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Created attachment 37029
--> https://bz.apache.org/bugzilla/attachment.cgi?id=37029&action=edit
Sample input file
Hi there,
we recently realised that documents (.doc not .docx) with embedded excel
spreadsheets have their text automatically extracted as part of the text
extraction process.
// pass an input stream (.doc sample containing an embedded excel file with
// some text in the cells)
POITextExtractor t =
org.apache.poi.extractor.ExtractorFactory.createExtractor(bis);
// produces the text of the .doc document BUT also the embedded excel
// documents contents - is there a way to turn this feature off?
t.getText()
Please let us know if there is something we can do to get around this and turn
this feature off for the text extractor.
Thanks,
Rob
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]