https://bz.apache.org/bugzilla/show_bug.cgi?id=64164

            Bug ID: 64164
           Summary: (POI 3.17) - Embedded files in .doc text extracted
                    automatically - how to skip these
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: Mac OS X 10.1
            Status: NEW
          Severity: normal
          Priority: P2
         Component: POI Overall
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

Created attachment 37029
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37029&action=edit
Sample input file

Hi there,

we recently realised that documents (.doc not .docx) with embedded excel
spreadsheets have their text automatically extracted as part of the text
extraction process.

  // pass an input stream (.doc sample containing an embedded excel file with 
  // some text in the cells)

  POITextExtractor t = 
       org.apache.poi.extractor.ExtractorFactory.createExtractor(bis);

  // produces the text of the .doc document BUT also the embedded excel 
  // documents contents - is there a way to turn this feature off?

  t.getText()


Please let us know if there is something we can do to get around this and turn
this feature off for the text extractor.

Thanks,
Rob

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to