https://issues.apache.org/bugzilla/show_bug.cgi?id=54213

--- Comment #4 from Yegor Kozlov <[email protected]> ---
I don't know an easy way to tell MSGraph.Chart from a real Excel file.  For
embedded documents Tika should always check ProgID, this property is stored in
the host container. 

In this particular case you are reading embedded data from a .ppt file and you
should check OLEShape#getProgID(). For Excel it should return "Worksheet", for
Word - "Document", for MSGraph - "MSGraph.Chart", etc. One problem is that
ProgID can contain suffix, e.g. "MSGraph.Chart.8" so it should be a regex check
or "startWith" logic. 



(In reply to comment #3)
> Interesting, all news to me!
> 
> Is there an easy way that you know to tell if a file containing a Workbook
> entry is really an Excel file, or instead a MSGraph.Chart? We'll need that
> logic for Tika

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to