[jira] [Updated] (EMAIL-130) Problem parsing EMail-Attachmentfilename (ISO-8859-15)

Olaf K. (JIRA) Fri, 02 Aug 2013 08:34:35 -0700

     [ 
https://issues.apache.org/jira/browse/EMAIL-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Olaf K. updated EMAIL-130:
--------------------------

    Description: 
I use common-email-1.3.1 to parse emails from a imap-server.
After parsing an email with an pdf-attachment I received the following 
attachment-filename: 
ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74
But the filename should be “Zählerstandsmitteilung_06_13.pdf”.

I discovered the sourcecode and change the method 
MimeMessageParser.getDataSourceName() as follows:

{code}
        protected String getDataSourceName(Part part, DataSource dataSource) 
throws MessagingException, UnsupportedEncodingException {
                String result = dataSource.getName();

                if (result == null || result.length() == 0) {
                        result = part.getFileName();
                }

                if (result != null && result.length() > 0) {
                        result = MimeUtility.decodeText(result);
                } else {
                        result = null;
                }
// NEW-Start
                // result could be = 
ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74
                if (result.indexOf("%") != -1) {
                        String rawContentType = part.getContentType();
                        // extract the name from contenttype: 
application/pdf;\n\rname="=?ISO-8859-15?Q?Z=E4hlerstandsmitteilung=5F06=5F13=2Epdf?="
                        int nameIndex = rawContentType.indexOf("name=\"");
                        if (nameIndex != -1) {
                                rawContentType = 
rawContentType.substring(nameIndex);
                                rawContentType = 
rawContentType.substring(rawContentType.indexOf('"') + 1, 
rawContentType.lastIndexOf('"'));
                                // ISO-Decoding
                                if (rawContentType.startsWith("=?") || 
rawContentType.endsWith("?=")) {
                                        result = 
MimeUtility.decodeText(rawContentType);
                                }
                        }
                }
// NEW-END

                return result;
        }
{code}

But this solution only works for ISO-8859-15 encoded emails.

You could reproduce this behavior with the following steps.
- Create an PDF with the filename "Zählerstandsmitteilung_06_13.pdf"
- Create an EMail with Tunderbird.
- Set EMail-Format RawText and encoding to ISO-8859-15
I attached such an email to this issue (msg-Outlook and eml-Thunderbird).


part.getFilename return: 
ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74

It should: =?iso-8859-1?Q?Z=E4hlerstandsmitteilung=5F06=5F13.pdf?=
With this kind of filename MimeUtility.decodeText(result); fix the encoding.
 

  was:
I use common-email-1.3.1 to parse emails from a imap-server.
After parsing an email with an pdf-attachment I received the following 
attachment-filename: 
ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74
But the filename should be “Zählerstandsmitteilung_06_13.pdf”.

I discovered the sourcecode and change the method 
MimeMessageParser.getDataSourceName() as follows:

{code}
        protected String getDataSourceName(Part part, DataSource dataSource) 
throws MessagingException, UnsupportedEncodingException {
                String result = dataSource.getName();

                if (result == null || result.length() == 0) {
                        result = part.getFileName();
                }

                if (result != null && result.length() > 0) {
                        result = MimeUtility.decodeText(result);
                } else {
                        result = null;
                }
// NEW-Start
                // result could be = 
ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74
                if (result.indexOf("%") != -1) {
                        String rawContentType = part.getContentType();
                        // extract the name from contenttype: 
application/pdf;\n\rname="=?ISO-8859-15?Q?Z=E4hlerstandsmitteilung=5F06=5F13=2Epdf?="
                        int nameIndex = rawContentType.indexOf("name=\"");
                        if (nameIndex != -1) {
                                rawContentType = 
rawContentType.substring(nameIndex);
                                rawContentType = 
rawContentType.substring(rawContentType.indexOf('"') + 1, 
rawContentType.lastIndexOf('"'));
                                // ISO-Decoding
                                if (rawContentType.startsWith("=?") || 
rawContentType.endsWith("?=")) {
                                        result = 
MimeUtility.decodeText(rawContentType);
                                }
                        }
                }
// NEW-END

                return result;
        }
{code}

But this solution only works for ISO-8859-15 encoded emails.

You could reproduce this behavior with the following steps.
- Create an PDF with the filename "Zählerstandsmitteilung_06_13.pdf"
- Create an EMail with Tunderbird.
- Set EMail-Format RawText and encoding to ISO-8859-15
I attached such an email to this issue.


part.getFilename return: 
ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74

It should: =?iso-8859-1?Q?Z=E4hlerstandsmitteilung=5F06=5F13.pdf?=
With this kind of filename MimeUtility.decodeText(result); fix the encoding.
 

    
> Problem parsing EMail-Attachmentfilename (ISO-8859-15)
> ------------------------------------------------------
>
>                 Key: EMAIL-130
>                 URL: https://issues.apache.org/jira/browse/EMAIL-130
>             Project: Commons Email
>          Issue Type: Bug
>    Affects Versions: 1.3.1
>         Environment: Thunderbird/17.0.5
>            Reporter: Olaf K.
>            Priority: Critical
>         Attachments: Zählerstandsmitteilung_06_13.pdf.eml, 
> Zählerstandsmitteilung_06_13 pdf.msg
>
>
> I use common-email-1.3.1 to parse emails from a imap-server.
> After parsing an email with an pdf-attachment I received the following 
> attachment-filename: 
> ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74
> But the filename should be “Zählerstandsmitteilung_06_13.pdf”.
> I discovered the sourcecode and change the method 
> MimeMessageParser.getDataSourceName() as follows:
> {code}
>       protected String getDataSourceName(Part part, DataSource dataSource) 
> throws MessagingException, UnsupportedEncodingException {
>               String result = dataSource.getName();
>               if (result == null || result.length() == 0) {
>                       result = part.getFileName();
>               }
>               if (result != null && result.length() > 0) {
>                       result = MimeUtility.decodeText(result);
>               } else {
>                       result = null;
>               }
> // NEW-Start
>               // result could be = 
> ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74
>               if (result.indexOf("%") != -1) {
>                       String rawContentType = part.getContentType();
>                       // extract the name from contenttype: 
> application/pdf;\n\rname="=?ISO-8859-15?Q?Z=E4hlerstandsmitteilung=5F06=5F13=2Epdf?="
>                       int nameIndex = rawContentType.indexOf("name=\"");
>                       if (nameIndex != -1) {
>                               rawContentType = 
> rawContentType.substring(nameIndex);
>                               rawContentType = 
> rawContentType.substring(rawContentType.indexOf('"') + 1, 
> rawContentType.lastIndexOf('"'));
>                               // ISO-Decoding
>                               if (rawContentType.startsWith("=?") || 
> rawContentType.endsWith("?=")) {
>                                       result = 
> MimeUtility.decodeText(rawContentType);
>                               }
>                       }
>               }
> // NEW-END
>               return result;
>       }
> {code}
> But this solution only works for ISO-8859-15 encoded emails.
> You could reproduce this behavior with the following steps.
> - Create an PDF with the filename "Zählerstandsmitteilung_06_13.pdf"
> - Create an EMail with Tunderbird.
> - Set EMail-Format RawText and encoding to ISO-8859-15
> I attached such an email to this issue (msg-Outlook and eml-Thunderbird).
> part.getFilename return: 
> ISO-8859-15''%5A%E4%68%6C%65%72%73%74%61%6E%64%73%6D%69%74%74
> It should: =?iso-8859-1?Q?Z=E4hlerstandsmitteilung=5F06=5F13.pdf?=
> With this kind of filename MimeUtility.decodeText(result); fix the encoding.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (EMAIL-130) Problem parsing EMail-Attachmentfilename (ISO-8859-15)

Reply via email to