https://bz.apache.org/bugzilla/show_bug.cgi?id=65721
Bug ID: 65721
Summary: Extracting embedded files from non-standard ppt
Product: POI
Version: 5.0.x-dev
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: HSLF
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Over on https://issues.apache.org/jira/browse/TIKA-3526, matcha007 shared a ppt
file created by WPS 表格 that handles embedded files slightly differently than
standard ppt.
I tried some basic stuff with 5.1.0 and still had little luck.
The file is:
https://issues.apache.org/jira/secure/attachment/13032100/13032100_embedded+attachment.ppt
When I do the usual iterate through slides and then iterate through shapes
looking for HSLFObjectShape, the objectShape.getObjectData() returns null
because, as matcha007 pointed out, the _exEmbed is not found in
HSLFObjectShape's
private ExEmbed getExEmbed(boolean create) {...
matcha007 found that if he added 3 to the objectId, in getExEmbed, it seemed to
work on this file, but there's no motivation for that (that I know of), and it
looks like it would break everything else.
I can extract the embedded files if I iterate through HSLFObjectData from that
slideshow level:
POIFSFileSystem pfs = new POIFSFileSystem(p.toFile());
try (HSLFSlideShow ss = new HSLFSlideShow(pfs.getRoot())) {
HSLFObjectData[] objectData = ss.getEmbeddedObjects();
However, I can't then link those back to the ids in the shapes for this
particular file.
What can we do with this file?
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]