[ https://issues.apache.org/jira/browse/EMAIL-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olaf K. updated EMAIL-130: -------------------------- Description: I use common-email-1.3.1 to parse emails from a imap-server. After parsing an email with an pdf-attachment I received the following attachment-filename: ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74 But the filename should be “Zählerstandsmitteilung_06_13.pdf”. I discovered the sourcecode and change the method MimeMessageParser.getDataSourceName() as follows: {code} protected String getDataSourceName(Part part, DataSource dataSource) throws MessagingException, UnsupportedEncodingException { String result = dataSource.getName(); if (result == null || result.length() == 0) { result = part.getFileName(); } if (result != null && result.length() > 0) { result = MimeUtility.decodeText(result); } else { result = null; } // NEW-Start // result could be = ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74 if (result.indexOf("%") != -1) { String rawContentType = part.getContentType(); // extract the name from contenttype: application/pdf;\n\rname="=?ISO-8859-15?Q?Z=E4hlerstandsmitteilung=5F06=5F13=2Epdf?=" int nameIndex = rawContentType.indexOf("name=\""); if (nameIndex != -1) { rawContentType = rawContentType.substring(nameIndex); rawContentType = rawContentType.substring(rawContentType.indexOf('"') + 1, rawContentType.lastIndexOf('"')); // ISO-Decoding if (rawContentType.startsWith("=?") || rawContentType.endsWith("?=")) { result = MimeUtility.decodeText(rawContentType); } } } // NEW-END return result; } {code} But this solution only works for ISO-8859-15 encoded emails. You could reproduce this behavior with the following steps. - Create an PDF with the filename "Zählerstandsmitteilung_06_13.pdf" - Create an EMail with Tunderbird. - Set EMail-Format RawText and encoding to ISO-8859-15 I attached such an email to this issue (msg-Outlook and eml-Thunderbird). part.getFilename return: ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74 It should: =?iso-8859-1?Q?Z=E4hlerstandsmitteilung=5F06=5F13.pdf?= With this kind of filename MimeUtility.decodeText(result); fix the encoding. was: I use common-email-1.3.1 to parse emails from a imap-server. After parsing an email with an pdf-attachment I received the following attachment-filename: ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74 But the filename should be “Zählerstandsmitteilung_06_13.pdf”. I discovered the sourcecode and change the method MimeMessageParser.getDataSourceName() as follows: {code} protected String getDataSourceName(Part part, DataSource dataSource) throws MessagingException, UnsupportedEncodingException { String result = dataSource.getName(); if (result == null || result.length() == 0) { result = part.getFileName(); } if (result != null && result.length() > 0) { result = MimeUtility.decodeText(result); } else { result = null; } // NEW-Start // result could be = ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74 if (result.indexOf("%") != -1) { String rawContentType = part.getContentType(); // extract the name from contenttype: application/pdf;\n\rname="=?ISO-8859-15?Q?Z=E4hlerstandsmitteilung=5F06=5F13=2Epdf?=" int nameIndex = rawContentType.indexOf("name=\""); if (nameIndex != -1) { rawContentType = rawContentType.substring(nameIndex); rawContentType = rawContentType.substring(rawContentType.indexOf('"') + 1, rawContentType.lastIndexOf('"')); // ISO-Decoding if (rawContentType.startsWith("=?") || rawContentType.endsWith("?=")) { result = MimeUtility.decodeText(rawContentType); } } } // NEW-END return result; } {code} But this solution only works for ISO-8859-15 encoded emails. You could reproduce this behavior with the following steps. - Create an PDF with the filename "Zählerstandsmitteilung_06_13.pdf" - Create an EMail with Tunderbird. - Set EMail-Format RawText and encoding to ISO-8859-15 I attached such an email to this issue. part.getFilename return: ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74 It should: =?iso-8859-1?Q?Z=E4hlerstandsmitteilung=5F06=5F13.pdf?= With this kind of filename MimeUtility.decodeText(result); fix the encoding. > Problem parsing EMail-Attachmentfilename (ISO-8859-15) > ------------------------------------------------------ > > Key: EMAIL-130 > URL: https://issues.apache.org/jira/browse/EMAIL-130 > Project: Commons Email > Issue Type: Bug > Affects Versions: 1.3.1 > Environment: Thunderbird/17.0.5 > Reporter: Olaf K. > Priority: Critical > Attachments: Zählerstandsmitteilung_06_13.pdf.eml, > Zählerstandsmitteilung_06_13 pdf.msg > > > I use common-email-1.3.1 to parse emails from a imap-server. > After parsing an email with an pdf-attachment I received the following > attachment-filename: > ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74 > But the filename should be “Zählerstandsmitteilung_06_13.pdf”. > I discovered the sourcecode and change the method > MimeMessageParser.getDataSourceName() as follows: > {code} > protected String getDataSourceName(Part part, DataSource dataSource) > throws MessagingException, UnsupportedEncodingException { > String result = dataSource.getName(); > if (result == null || result.length() == 0) { > result = part.getFileName(); > } > if (result != null && result.length() > 0) { > result = MimeUtility.decodeText(result); > } else { > result = null; > } > // NEW-Start > // result could be = > ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74 > if (result.indexOf("%") != -1) { > String rawContentType = part.getContentType(); > // extract the name from contenttype: > application/pdf;\n\rname="=?ISO-8859-15?Q?Z=E4hlerstandsmitteilung=5F06=5F13=2Epdf?=" > int nameIndex = rawContentType.indexOf("name=\""); > if (nameIndex != -1) { > rawContentType = > rawContentType.substring(nameIndex); > rawContentType = > rawContentType.substring(rawContentType.indexOf('"') + 1, > rawContentType.lastIndexOf('"')); > // ISO-Decoding > if (rawContentType.startsWith("=?") || > rawContentType.endsWith("?=")) { > result = > MimeUtility.decodeText(rawContentType); > } > } > } > // NEW-END > return result; > } > {code} > But this solution only works for ISO-8859-15 encoded emails. > You could reproduce this behavior with the following steps. > - Create an PDF with the filename "Zählerstandsmitteilung_06_13.pdf" > - Create an EMail with Tunderbird. > - Set EMail-Format RawText and encoding to ISO-8859-15 > I attached such an email to this issue (msg-Outlook and eml-Thunderbird). > part.getFilename return: > ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74 > It should: =?iso-8859-1?Q?Z=E4hlerstandsmitteilung=5F06=5F13.pdf?= > With this kind of filename MimeUtility.decodeText(result); fix the encoding. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira