Issue Decoding PDF Attachments

Max Gravitt Sun, 26 Dec 2010 15:41:18 -0800

Hi,

I have an application (running on Google App Engine) that strips attachments 
from inbound emails and saves them as a byte[] in the JDO data store.  I think 
I'm running into a decoding issue, but I'm unsure of the true issue or the 
resolution.  I'm finding that for some files, it embeds equal signs in places 
where the original document doesn't have any equal signs.  I've found that MS 
documents and HTML are rather tolerant of this behavior, but PDFs tend to get 
corrupt when this happens.  Also, it doesn't happen with all PDFs and it seems 
that it only happens when the attachment is has a transfer encoding of 
"quoted-printable".


I'm using MimeStreamParser and I extended SimpleContentHandler (bodyDecoded 
method).  Then, I use IOUtils.toByteArray(InputStream) to get the bytes that I 
save.  Any idea of what I may be missing?  

Below is an example of the contents of a PDF from the "more" command.  You can 
see the equal signs from the second representation of the file.

Original file (Good):
1 0 obj
<<
/CreationDate (D:20101203120005)
/Producer (SCS2PDF v1.0 (\251 BeppeCosta, 2005))
/Title (PRINT1)
>>
endobj
2 0 obj
<<
/Type /Catalog
/Pages 3 0 R
>>
endobj

File Snippet After Parsing, Saving, and Retrieving (Bad):
1 0 obj
<<
/CreationDate =
(D:20101203120005)
/Producer (SCS2PDF v1.0 (\251 BeppeCosta, =
2005))
/Title (PRINT1)
>>
endobj
2 0 obj
<<
/Type =
/Catalog
/Pages 3 0 R
>>
endobj

Any thoughts?
thanks!
MG

Issue Decoding PDF Attachments

Reply via email to