Hi dev team,
This is a bit of a long email, but I wanted to pass on the research that I've been doing, and some recommendations for changes to the HPSF thumbnailing API. I have needed to extract thumbnails from a set of Microsoft Office docs. They have been produced on Windows, and on Mac. The existing org.apache.poi.hpsf.Thumbnail class handles the Windows case (CFTAG_WINDOWS & CF_METAFILEPICT). However, it does not handle the Macintosh case (CFTAG_MACINTOSH & CF_MACQD). The Macintosh thumbnails are stored in QuickDraw format (extended version 2). This is the Mac-proprietary SVG equivalent. The thumbnail has a marker at the beginning of the clipboard data, "PICT". It needs to be replaced with 512 null bytes. References: http://www.fileformat.info/format/macpict/egff.htm http://developer.apple.com/legacy/mac/library/documentation/mac/QuickDraw/Qu ickDraw-462.html#HEADING462-0 I have managed to create readable files, after a bit of manipulation of the clipboard data. Here is the high-level process for getting a file in a valid format. Overview of extraction steps 01. Get the summary information from the document (005SummaryInformation) 02. Get the thumbnail object from summary information 03. Get the clipboard format tag from the thumbnail object 04. Confirm that cftag==CFTAG_MACINTOSH 05. Get the thumbnail data from the thumbnail object 06. Confirm that substr(thumbdata,Thumbnail.OFFSET_CF,"PICT".length())=="PICT" 07. Create a byte array with a 512-byte x00 header 08. Append the byte array with substr(thumbdata, Thumbnail.OFFSET_CF + "PICT".length(), thumbdata.length() - Thumbnail.OFFSET_CF - "PICT".length()) 09. Return the byte array, or write to file (extension PICT, PCT, or PIC. mime image/x-pict) Specifications of the Macintosh clipboard formats 4 byte (ascii) - clipboard data format ["PICT"] 2 byte - picture size (byte count) 8 byte - bounding rectangle of picture [ x1 y1 x2 y2 ] 2 byte - VersionOp opcode [00 11] 2 byte - Version opcode [02 FF] 2 byte - Header opcode [0C 00] 24 byte - header information - 2 byte - picture version ( -1 = version 2 ; -2 = extended version 2 ) - 2 byte - reserved (unused) [ 00 00 ] - 4 byte - horizontal res [ 00 48 00 00 = 72 dpi ] - 4 byte - vertical res [ 00 48 00 00 = 72 dpi ] - 8 byte - source rectangle of picture [ x1 y1 x2 y2 ] - 2 byte - reserved (unused) [ 00 00 ] - 2 byte - reserved (unused) [ 00 00 ] Recommendations for change to org.apache.poi.hpsf.Thumbnail public static int CF_MACQD = 15; public static int OFFSET_MACQDDATA = 12; private static String TAG_MACQD = "PICT"; public long getClipboardFormat() throws HPSFException { long clipboardformat = 0; if (getClipboardFormatTag() == CFTAG_WINDOWS) { clipboardformat = LittleEndian.getUInt(getThumbnail(), OFFSET_CF); } else if (getClipboardFormatTag() == CFTAG_MACINTOSH) { String cftype = new String(getThumbnail(), Thumbnail.OFFSET_CF, TAG_MACQD.length()); if (cftype.matches(TAG_MACQD)) { clipboardformat = CF_MACQD; } else { throw new HPSFException("Clipboard Format Tag of Thumbnail must be " + TAG_MACQD + " for CFTAG_MACINTOSH"); } } else { throw new HPSFException("Clipboard Format Tag of Thumbnail must be " + "CFTAG_WINDOWS or CFTAG_MACINTOSH "); } return clipboardformat; } public byte[] getThumbnailAsPICT() throws HPSFException { if (!(getClipboardFormatTag() == CFTAG_MACINTOSH)) throw new HPSFException("Clipboard Format Tag of Thumbnail must " + "be CFTAG_MACINTOSH."); if (!(getClipboardFormat() == CF_MACQD)) throw new HPSFException("Clipboard Format of Thumbnail must " + "be CF_MACQD."); else { byte[] thumbnail = getThumbnail(); int pictImageLength = thumbnail.length - OFFSET_MACQDDATA; byte[] header = new byte[512]; for (int x=0; x < header.length; x++) { header[x] = 0; } byte[] pictImage = new byte[pictImageLength + header.length]; System.arraycopy(header, 0, pictImage, header.length); System.arraycopy(thumbnail, OFFSET_MACQDDATA, pictImage, pictImageLength); return pictImage; } } All the best, -Craig
