[dev] Mail charset problems with some DB backends (+ patch)

Marc Brockschmidt Tue, 11 Mar 2008 12:19:05 -0700

Heya,

We had some problems with OTRS 2.0.4, which simply lost emails which
were supposed to end up in one of our queues. I was able to track this
problem down to the PostgreSQL DB rejecting the body due to a broken
Unicode sequence. This was caused by the emails, which were marked as being
UTF8, but contained a disclaimer attached by the customer's MTA - in
latin1.


After removing the QP encoding, OTRS noticed that the target charset for
the DB (utf8) and the source charset were the same and skipped
re-encoding with Perl's Encode module. This sounds like a sane
optimization, but the decode/re-encode step has the nice side-effect of
validating all byte sequences and replacing broken data by some valid
character.

I removed the optimization, mails are now processed without a
problem. I have attached two patch: The first applies to 2.0.4 and has
been tested, while the second applies to 2.2.5, but hasn't been tested. 
Please apply it, or at least add a configure option enforcing
re-encoding of mails.

Thanks,
Marc

--- Kernel/System/Encode.pm	2008-03-11 14:14:03.000000000 +0100
+++ Kernel/System/Encode.pm	2008-03-11 14:19:44.000000000 +0100
@@ -179,33 +179,29 @@
     if (!$Self->{CharsetEncodeSupported}) {
         return $Param{Text};
     }
-    # if no encode is needed
-    if ($Param{From} =~ /^$Param{To}$/i) {
-        if ($Param{To} =~ /^utf(-8|8)/i) {
-            Encode::_utf8_on($Param{Text});
-        }
-        return $Param{Text};
+    # always decode/encode, as some picky DB backends (like Postgres)
+    # fail horribly when trying to insert broken sequences. When recoding,
+    # Encode.pm does the right thing [tm] and replaces broken byte sequences
+    # by some safe character (usually '?'). Broken sequences usually happen
+    # when user mail has encoding A and some mail gateway appends some
+    # footer/disclaimer in encoding B (without doing the needed MIME magic)
+    if ($Param{Force}) {
+        Encode::_utf8_off($Param{Text});
+    }
+    if (! eval { Encode::from_to($Param{Text}, $Param{From}, $Param{To}) } ) {
+        print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text}) not supported!\n";
     }
-    # encode is needed
     else {
-        if ($Param{Force}) {
-            Encode::_utf8_off($Param{Text});
-        }
-        if (! eval { Encode::from_to($Param{Text}, $Param{From}, $Param{To}) } ) {
-            print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text}) not supported!\n";
-        }
-        else {
-          # set utf-8 flag
-          if ($Param{To} =~ /^utf(8|-8)$/i) {
-                Encode::encode_utf8($Param{Text});
-                Encode::_utf8_on($Param{Text});
-          }
-          if ($Self->{Debug}) {
-              print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text})!\n";
-          }
-        }
-        return $Param{Text};
+      # set utf-8 flag
+      if ($Param{To} =~ /^utf(8|-8)$/i) {
+        Encode::encode_utf8($Param{Text});
+        Encode::_utf8_on($Param{Text});
+      }
+      if ($Self->{Debug}) {
+          print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text})!\n";
+      }
     }
+    return $Param{Text};
 }
 
 =item SetIO()

--- Kernel/System/Encode.pm	2008-03-11 16:18:01.000000000 +0100
+++ Kernel/System/Encode.pm	2008-03-11 16:19:31.000000000 +0100
@@ -195,37 +195,33 @@
         return $Param{Text};
     }
 
-    # if no encode is needed
-    if ( $Param{From} =~ /^$Param{To}$/i ) {
-        if ( $Param{To} =~ /^utf(-8|8)/i ) {
-            Encode::_utf8_on( $Param{Text} );
-        }
-        return $Param{Text};
-    }
 
-    # encode is needed
+    # always decode/encode, as some picky DB backends (like Postgres)
+    # fail horribly when trying to insert broken sequences. When recoding,
+    # Encode.pm does the right thing [tm] and replaces broken byte sequences
+    # by some safe character (usually '?'). Broken sequences usually happen
+    # when user mail has encoding A and some mail gateway appends some
+    # footer/disclaimer in encoding B (without doing the needed MIME magic)
+    if ( $Param{Force} ) {
+        Encode::_utf8_off( $Param{Text} );
+    }
+    if ( !eval { Encode::from_to( $Param{Text}, $Param{From}, $Param{To} ) } ) {
+        print STDERR
+            "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text}) not supported!\n";
+    }
     else {
-        if ( $Param{Force} ) {
-            Encode::_utf8_off( $Param{Text} );
-        }
-        if ( !eval { Encode::from_to( $Param{Text}, $Param{From}, $Param{To} ) } ) {
-            print STDERR
-                "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text}) not supported!\n";
-        }
-        else {
 
-            # set utf-8 flag
-            if ( $Param{To} =~ /^utf(8|-8)$/i ) {
+        # set utf-8 flag
+        if ( $Param{To} =~ /^utf(8|-8)$/i ) {
 
-                #                Encode::encode_utf8($Param{Text});
-                Encode::_utf8_on( $Param{Text} );
-            }
-            if ( $Self->{Debug} ) {
-                print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text})!\n";
-            }
+            #                Encode::encode_utf8($Param{Text});
+            Encode::_utf8_on( $Param{Text} );
+        }
+        if ( $Self->{Debug} ) {
+            print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text})!\n";
         }
-        return $Param{Text};
     }
+    return $Param{Text};
 }
 
 =item SetIO()

pgpJhpac1PBTC.pgp
Description: PGP signature

_______________________________________________
OTRS mailing list: dev - Webpage: http://otrs.org/
Archive: http://lists.otrs.org/pipermail/dev
To unsubscribe: http://lists.otrs.org/cgi-bin/listinfo/dev

[dev] Mail charset problems with some DB backends (+ patch)

Reply via email to