Hi.
I thought mbstring functions can handle chr(160) in UTF-8.
But, when I test my patch, mbstring return false and report error
"mb_strpos(): Unknown encoding or conversion error.".
like this:
mb_internal_encoding("UTF-8");
$test_str = "abcd".chr(160)."efg";
mb_strpos($test_str, chr(160)) ==> false
I searched web sites.
- The chr(160) in UTF-8 is incorrect.
(http://en.wikipedia.org/wiki/UTF-8)
- In UTF-8, no-break space(nbsp) is 0xC2 0xA0.
(http://www.fileformat.info/info/unicode/char/00a0/index.htm)
So, how about this?
===================================
--- main.inc_ 2007-09-03 16:10:32.000000000 +0900 (rev 774)
+++ main.inc 2007-09-12 01:05:31.000000000 +0900 (changed)
@@ -1103,6 +1103,17 @@
return $str;
}
+function mb_str_replace($search_str, $replace_str, $str)
+ {
+ $current_pos = 0;
+ while (($found_pos = mb_strpos($str, $search_str, $current_pos)) !==
false)
+ {
+ $str = mb_substr($str, 0, $found_pos).$replace_str.mb_substr($str,
$found_pos + mb_strlen($search_str));
+ $current_pos = $found_pos + strlen($replace_str);
+ }
+
+ return $str;
+ }
/**
* Replacing specials characters to a specific encoding type
@@ -1123,7 +1134,12 @@
// convert nbsps back to normal spaces if not html
if ($enctype!='html')
- $str = str_replace(chr(160), ' ', $str);
+ {
+ if ($OUTPUT->get_charset()=='UTF-8')
+ $str = mb_str_replace(chr(194).chr(160), ' ', $str);
+ else
+ $str = str_replace(chr(160), ' ', $str);
+ }
// encode for plaintext
if ($enctype=='text')
===================================
Yoshikazu.
On Wed, 5 Sep 2007 03:05:21 +0200, till <[EMAIL PROTECTED]> wrote:
> On 9/3/07, Yoshikazu Tsuji <[EMAIL PROTECTED]> wrote:
>> Hi.
>>
>> The following code causes Ticket #1484429
>> (http://trac.roundcube.net/trac.cgi/ticket/1484429).
>> =============================================================
>> = "program/include/main.inc" function rep_specialchars_output
>>
>> // convert nbsps back to normal spaces if not html
>> if ($enctype!='html')
>> $str = str_replace(chr(160), ' ', $str);
>> =============================================================
>>
>> This problem is happened in multibyte enviroment (japanese too).
>> In message list, function rep_specialchars_output garbled
>> UTF-8 message subjects.
>>
>> Is converting chr(160) to space really necessary ?
>>
>> This is patch using multi byte functions.
>>
>> ===============================================================
>> --- main.inc_ 2007-09-03 16:10:32.000000000 +0900
>> +++ main.inc 2007-09-03 16:22:59.000000000 +0900
>> @@ -1122,8 +1122,17 @@
>> $enctype = $GLOBALS['OUTPUT_TYPE'];
>>
>> // convert nbsps back to normal spaces if not html
>> - if ($enctype!='html')
>> - $str = str_replace(chr(160), ' ', $str);
>> + if ($enctype!='html') {
>> + $current_pos = 0;
>> + while(true) {
>> + $found_pos = mb_strpos($str, chr(160), $current_pos);
>> + if($found_pos == false)
>> + break;
>> +
>> + $str = mb_substr($str, 0, $found_pos)." ".mb_substr($str,
> $found_pos
>> + 1, mb_strlen($str));
>> + $currentpos += 1;
>> + }
>> + }
>>
>> // encode for plaintext
>> if ($enctype=='text')
>> ===============================================================
>
> multibyte looks like the better alternative, especially since we are
> dealing with people from different countries. And since we are using
> mb already, I have no issues with this.
>
> Just one thing, can you add this to the trac? Please? :)
>
> Thanks,
> Till
_______________________________________________
List info: http://lists.roundcube.net/dev/