> > Hi, Niels > > Thank you for your comment. > Indeed, returns false is make sense. > > Therefore, I changed to returns false when invalid UTF-8 strings. > > Regards > Yuya > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > -----------------------------
Sorry, again. I checked behavior of mb_str_split function. So Illegal byte sequences are returned as is. ``` sapi/cli/php -r 'var_dump(mb_str_split("あ\xc2\xf4\x80あ"));' array(4) { [0]=> string(3) "あ" [1]=> string(2) "��" [2]=> string(1) "�" [3]=> string(3) "あ" } ``` And, I reading ICU document about utext_openUTF8 (below is link): https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utext_8h.html#a130e7cba201c4b38799b432eb269f6d5 > Any invalid UTF-8 in the input will be handled in this way: a sequence of > bytes that has the form of a truncated, but otherwise valid, UTF-8 sequence > will be replaced by a single unicode replacement character, \uFFFD. Any other > illegal bytes will each be replaced by a \uFFFD. Therefore, I think encoding check is not need. Returns only arrays together with mb_str_split. Regards Yuya -- --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------