DO NOT REPLY [Bug 47731] New: Word Extractor considers text copied from some website as an embedded object

bugzilla Mon, 24 Aug 2009 22:50:49 -0700

https://issues.apache.org/bugzilla/show_bug.cgi?id=47731


           Summary: Word Extractor considers text copied from some website
                    as an embedded object
           Product: POI
           Version: 3.2-FINAL
          Platform: PC
        OS/Version: Windows Server 2003
            Status: NEW
          Severity: major
          Priority: P2
         Component: HWPF
        AssignedTo: [email protected]
        ReportedBy: [email protected]


--- Comment #0 from Gitu <[email protected]> 2009-08-24 22:50:21 PDT ---
Hi,

I have copied some text from some web page and pasted that in a word document.
Now, when I use WordExtractor to extract the content of that document, then
complete content gets extracted but the summary information comes multiple
times.

After investigating I came to know that each part in that document is
considered as an embedded object and hence for each embedded object, summary is
getting extracted ie. same value is coming those many times.

I also wanted to know if considering an HTML content as an Embedded object is a
valid behaviour.

I have attached a document which can reproduce the scenario.

Many thanks in advance,
Gitu

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

DO NOT REPLY [Bug 47731] New: Word Extractor considers text copied from some website as an embedded object

Reply via email to