[jira] [Comment Edited] (MIME4J-218) Content-Type Fallback Character Set

Wolfgang Fahl (JIRA) Mon, 29 Sep 2014 06:22:12 -0700

    [ 
https://issues.apache.org/jira/browse/MIME4J-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151616#comment-14151616
 ]


Wolfgang Fahl edited comment on MIME4J-218 at 9/29/14 1:21 PM:
---------------------------------------------------------------

Oleg's work around does not work since the #resolveCharset method is private 
static ...
I also ran into the issue of Contentype-Specs with Charset= Information that 
makes mime4j choke with an UnsupportedEncodingException. E.g.:

Content-Type: TEXT/HTML; CHARSET=None
Content-Transfer-Encoding: QUOTED-PRINTABLE

I think this bugreport should be reopened and a proper fix implemented. The 
issue is still happening with 0.8.0-SNAPSHOT

this is my proposed fix for BasicBodyFactory:

 public static boolean lenient=true;
    
    /**
     * select the Charset for the given mimeCharset string
     * 
     *  if you need support for non standard or invalid mimeCharset 
specifications
     *  you might want to create your own derived BodyFactory extending 
BasicBodyFactory and
     *  overriding this method as suggested by:
     *    https://issues.apache.org/jira/browse/MIME4J-218
     *  
     *  the default behaviour is lenient, invalid mimeCharset specs will return 
the defaultCharset
     * 
     *  @param mimeCharset - the string specification for a charset e.g. "UTF-8"
     *  @throws UnsupportedEncodingException if the mimeCharset is invalid
     */ 
    protected Charset resolveCharset(final String mimeCharset) throws 
UnsupportedEncodingException {
        Charset result=null;
        if (lenient) {
          result=Charset.defaultCharset();
        }
        if (mimeCharset !=null) {
        try {
          result=  Charset.forName(mimeCharset);
        } catch (UnsupportedCharsetException ex) {
                if (!lenient) 
                throw new UnsupportedEncodingException(mimeCharset);
        }
      }
      return result;
    }

this would be a test-message:

Date: Fri, 27 Apr 2007 16:08:23 +0200
From: Foo Bar <[email protected]>
MIME-Version: 1.0
To:  [email protected]
Subject: Unsupported Character Encoding
Content-Type: multipart/mixed;
 boundary="------------090404080405080108000909"

This is a multi-part message in MIME format.
--------------090404080405080108000909
Content-Type: text/plain; charset=None
Content-Transfer-Encoding: 7bit

Body.

--------------090404080405080108000909
--------------090404080405080108000909--




was (Author: wolfgangfahl):
Oleg's work around does not work since the #resolveCharset method is private 
static ...
I also ran into the issue of Contentype-Specs with Charset= Information that 
makes mime4j choke with an UnspportedEncodingException. E.g.:

Content-Type: TEXT/HTML; CHARSET=None
Content-Transfer-Encoding: QUOTED-PRINTABLE

I think this bugreport should be reopened and a proper fix implemented. The 
issue is still happening with 0.8.0-SNAPSHOT

> Content-Type Fallback Character Set
> -----------------------------------
>
>                 Key: MIME4J-218
>                 URL: https://issues.apache.org/jira/browse/MIME4J-218
>             Project: James Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.7.2
>            Reporter: Rickard Ekeroth
>
> Would it be possible to add a feature that would allow for specifying a 
> fallback character set to use when the character set in a 'Content-Type' 
> header is not recognized by Java? In the old 0.6.2 version, that we used 
> before, the character set 'ISO-8859-1' was used as a fallback but in the 
> 0.7.2 version an UnsupportedEncodingException is thrown when the parser 
> encounters an unknown character set in a Content-Type header.
> Here is the relevant part of the exception stack trace:
> Caused by: java.io.UnsupportedEncodingException: x-user-defined
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:52)
> at java.io.InputStreamReader.<init>(InputStreamReader.java:83)
> at 
> org.apache.james.mime4j.message.BasicTextBody.getReader(BasicTextBody.java:49)
> We receive, parse and archive a vast number of confidential e-mail messages 
> (for which we use Mime4J) and every now and then we get an e-mail message 
> that contains a non-standard character encoding name (in this case 
> 'x-user-defined'). With the old (0.6) Mime4J version we were still able to 
> parse and read most of those e-mail messages because of the fallback 
> character set in the parser.
> I can unfortunately not post the entire message here but the content-type 
> header that caused the above exception looks like this:
> Content-Type: text/plain; charset="x-user-defined" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MIME4J-218) Content-Type Fallback Character Set

Reply via email to