Re: [RCD] Ticket #1484429

yskzt Tue, 11 Sep 2007 09:11:43 -0700

Hi.

I thought mbstring functions can handle chr(160) in UTF-8.
But, when I test my patch, mbstring return false and report error 
"mb_strpos(): Unknown encoding or conversion error.".


like this:
mb_internal_encoding("UTF-8");
$test_str = "abcd".chr(160)."efg";
mb_strpos($test_str, chr(160)) ==> false

I searched web sites.
- The chr(160) in UTF-8 is incorrect.
  (http://en.wikipedia.org/wiki/UTF-8)
- In UTF-8, no-break space(nbsp) is 0xC2 0xA0.
  (http://www.fileformat.info/info/unicode/char/00a0/index.htm)

So, how about this?

===================================
--- main.inc_   2007-09-03 16:10:32.000000000 +0900 (rev 774)
+++ main.inc    2007-09-12 01:05:31.000000000 +0900 (changed)
@@ -1103,6 +1103,17 @@
   return $str;
   }

+function mb_str_replace($search_str, $replace_str, $str)
+  {
+  $current_pos = 0;
+  while (($found_pos = mb_strpos($str, $search_str, $current_pos)) !==
false)
+    {
+    $str = mb_substr($str, 0, $found_pos).$replace_str.mb_substr($str,
$found_pos + mb_strlen($search_str));
+    $current_pos = $found_pos + strlen($replace_str);
+    }
+
+  return $str;
+  }

 /**
  * Replacing specials characters to a specific encoding type
@@ -1123,7 +1134,12 @@

   // convert nbsps back to normal spaces if not html
   if ($enctype!='html')
-    $str = str_replace(chr(160), ' ', $str);
+    {
+    if ($OUTPUT->get_charset()=='UTF-8')
+      $str = mb_str_replace(chr(194).chr(160), ' ', $str);
+    else
+      $str = str_replace(chr(160), ' ', $str);
+    }

   // encode for plaintext
   if ($enctype=='text')
===================================

Yoshikazu.

On Wed, 5 Sep 2007 03:05:21 +0200, till <[EMAIL PROTECTED]> wrote:
> On 9/3/07, Yoshikazu Tsuji <[EMAIL PROTECTED]> wrote:
>> Hi.
>>
>> The following code causes Ticket #1484429
>> (http://trac.roundcube.net/trac.cgi/ticket/1484429).
>> =============================================================
>> = "program/include/main.inc" function rep_specialchars_output
>>
>>   // convert nbsps back to normal spaces if not html
>>   if ($enctype!='html')
>>     $str = str_replace(chr(160), ' ', $str);
>> =============================================================
>>
>> This problem is happened in multibyte enviroment (japanese too).
>> In message list, function rep_specialchars_output garbled
>> UTF-8 message subjects.
>>
>> Is converting chr(160) to space really necessary ?
>>
>> This is patch using multi byte functions.
>>
>> ===============================================================
>> --- main.inc_   2007-09-03 16:10:32.000000000 +0900
>> +++ main.inc    2007-09-03 16:22:59.000000000 +0900
>> @@ -1122,8 +1122,17 @@
>>      $enctype = $GLOBALS['OUTPUT_TYPE'];
>>
>>    // convert nbsps back to normal spaces if not html
>> -  if ($enctype!='html')
>> -    $str = str_replace(chr(160), ' ', $str);
>> +  if ($enctype!='html') {
>> +    $current_pos = 0;
>> +    while(true) {
>> +      $found_pos = mb_strpos($str, chr(160), $current_pos);
>> +      if($found_pos == false)
>> +        break;
>> +
>> +      $str = mb_substr($str, 0, $found_pos)." ".mb_substr($str,
> $found_pos
>> + 1, mb_strlen($str));
>> +      $currentpos += 1;
>> +    }
>> +  }
>>
>>    // encode for plaintext
>>    if ($enctype=='text')
>> ===============================================================
>
> multibyte looks like the better alternative, especially since we are
> dealing with people from different countries. And since we are using
> mb already, I have no issues with this.
>
> Just one thing, can you add this to the trac? Please? :)
>
> Thanks,
> Till

_______________________________________________
List info: http://lists.roundcube.net/dev/

Re: [RCD] Ticket #1484429

Reply via email to