Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function

2024-03-06 Thread youkidearitai
2024年3月6日(水) 18:42 Niels Dossche :
>
> On 06/03/2024 01:37, youkidearitai wrote:
> > 2024年3月6日(水) 9:22 youkidearitai :
> >>
> >> Hi, Larry
> >> Hi, Niels
> >>
> >> 2024年3月6日(水) 6:47 Niels Dossche :
> >>>
> >>> Hi Larry
> >>> Hi Yuya
> >>>
> >>> So first of all, I meant the error handling in cases like these: 
> >>> https://github.com/php/php-src/pull/13580/files#diff-b8fe038d9d7539694593978bea5605f38dde4bcb6a016865130590e45e23202eR852-R860
> >>> The implementation still returns NULL here, so the signature is still 
> >>> incorrect. Either it should return false to match the other functions, or 
> >>> throw something and not return a value.
> >>>
> >>> On 05/03/2024 18:40, Larry Garfield wrote:
>  On Tue, Mar 5, 2024, at 7:25 AM, youkidearitai wrote:
> > 2024年3月5日(火) 5:52 Niels Dossche :
> >>
> >> Hi Yuya
> >>
> >> This sounds useful.
> >>
> >> I do have a question about the function signature:
> >> function grapheme_str_split(string $string, int $length = 1): array {}
> >>
> >> This always returns an array.
> >> However, looking at your PR it seems you return NULL on failure, but 
> >> the return type in the signature isn't nullable.
> >> Also, from a quick look, it seems other functions return false instead 
> >> of null on failure. So perhaps the return type should be array|false.
> >>
> >> What do you think? :)
> >>
> >> Kind regards
> >> Niels
> >>
> >> On 03/03/2024 00:21, youkidearitai wrote:
> >>> Hi, Internals
> >>>
> >>> I noticed PHP does not have grapheme cluster for str_split function.,
> >>> Until now, you had to use the PCRE function's \X.
> >>>
> >>> Therefore, I try create `grapheme_str_split` function.
> >>> https://github.com/php/php-src/pull/13580
> >>> It is possible to convert array per emoji and variation selectors 
> >>> using ICU.
> >>>
> >>> If it's fine, I'll create an RFC.
> >>>
> >>> Regards
> >>> Yuya
> >>>
> >
> > Hi, Niels
> >
> > Thank you for your comment.
> > Indeed, returns false is make sense.
> >
> > Therefore, I changed to returns false when invalid UTF-8 strings.
> >
> > Regards
> > Yuya
> 
>  Many legacy functions return false on error, but that is widely regarded 
>  as bad design.  Please do not continue bad design.
> >>>
> >>> I agree that returning false on error isn't ideal for exceptional cases, 
> >>> that's what exceptions are for.
> >>> Looking at the other grapheme functions makes me wonder though how 
> >>> consistent this would be, especially w.r.t. intl_get_error_*() and 
> >>> intl_error_name().
> >>>
> 
>  Right now, the best "standard" error handling mechanism available is 
>  exceptions.  false (or null) can very easily lead to incorrectly using 
>  that value as though it were valid, when it's not, which will sometimes 
>  cause a fatal error and sometimes cause a security leak.
> 
>  If the input value cannot be logically processed, that's an exception.  
>  (Or Error, perhaps.)
> 
>  --Larry Garfield
> >>>
> >>> Kind regards
> >>> Niels
> >>
> >> Thank you so much for advice.
> >> Indeed, This current grapheme* functions seems inconsistent.
> >>
> >> Therefore, it's one thing when returns null, throws any exception.
> >> Shall we do so just for the grapheme_str_split function?
> >>
> >> Regards
> >> Yuya
> >>
> >> --
> >> ---
> >> Yuya Hamada (tekimen)
> >> - https://tekitoh-memdhoi.info
> >> - https://github.com/youkidearitai
> >> -
> >
> > Ah, If throws exception when intl_error*, is required other an RFC?
> > If we make grapheme_str_split's signature is below (include null):
>
> Hi Yuya
>
> If you want to change other grapheme functions with respect to error 
> handling, then it requires a separate RFC.
> I think consistency between the functions is important.
>
> >
> > ```
> > function grapheme_str_split(string $string, int $length = 1): array|null {}
> > ```
> >
> > For now, I change signature to `array|null`.
>
> Most others functions return false on error, so I think it should be 
> array|false, and the implementation should use RETURN_FALSE instead of 
> RETURN_NULL.
>
> >
> > Regards
> > Yuya
> >
> >
>
> Kind regards
> Niels

Hi, Niels
Thank you very much.

I got it. You're right. I couldn't understand well.
I use returns false when something wrong.

I updated Pull Request. Please see.

Regards
Yuya


-- 
---
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-


Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function

2024-03-06 Thread Niels Dossche
On 06/03/2024 01:37, youkidearitai wrote:
> 2024年3月6日(水) 9:22 youkidearitai :
>>
>> Hi, Larry
>> Hi, Niels
>>
>> 2024年3月6日(水) 6:47 Niels Dossche :
>>>
>>> Hi Larry
>>> Hi Yuya
>>>
>>> So first of all, I meant the error handling in cases like these: 
>>> https://github.com/php/php-src/pull/13580/files#diff-b8fe038d9d7539694593978bea5605f38dde4bcb6a016865130590e45e23202eR852-R860
>>> The implementation still returns NULL here, so the signature is still 
>>> incorrect. Either it should return false to match the other functions, or 
>>> throw something and not return a value.
>>>
>>> On 05/03/2024 18:40, Larry Garfield wrote:
 On Tue, Mar 5, 2024, at 7:25 AM, youkidearitai wrote:
> 2024年3月5日(火) 5:52 Niels Dossche :
>>
>> Hi Yuya
>>
>> This sounds useful.
>>
>> I do have a question about the function signature:
>> function grapheme_str_split(string $string, int $length = 1): array {}
>>
>> This always returns an array.
>> However, looking at your PR it seems you return NULL on failure, but the 
>> return type in the signature isn't nullable.
>> Also, from a quick look, it seems other functions return false instead 
>> of null on failure. So perhaps the return type should be array|false.
>>
>> What do you think? :)
>>
>> Kind regards
>> Niels
>>
>> On 03/03/2024 00:21, youkidearitai wrote:
>>> Hi, Internals
>>>
>>> I noticed PHP does not have grapheme cluster for str_split function.,
>>> Until now, you had to use the PCRE function's \X.
>>>
>>> Therefore, I try create `grapheme_str_split` function.
>>> https://github.com/php/php-src/pull/13580
>>> It is possible to convert array per emoji and variation selectors using 
>>> ICU.
>>>
>>> If it's fine, I'll create an RFC.
>>>
>>> Regards
>>> Yuya
>>>
>
> Hi, Niels
>
> Thank you for your comment.
> Indeed, returns false is make sense.
>
> Therefore, I changed to returns false when invalid UTF-8 strings.
>
> Regards
> Yuya

 Many legacy functions return false on error, but that is widely regarded 
 as bad design.  Please do not continue bad design.
>>>
>>> I agree that returning false on error isn't ideal for exceptional cases, 
>>> that's what exceptions are for.
>>> Looking at the other grapheme functions makes me wonder though how 
>>> consistent this would be, especially w.r.t. intl_get_error_*() and 
>>> intl_error_name().
>>>

 Right now, the best "standard" error handling mechanism available is 
 exceptions.  false (or null) can very easily lead to incorrectly using 
 that value as though it were valid, when it's not, which will sometimes 
 cause a fatal error and sometimes cause a security leak.

 If the input value cannot be logically processed, that's an exception.  
 (Or Error, perhaps.)

 --Larry Garfield
>>>
>>> Kind regards
>>> Niels
>>
>> Thank you so much for advice.
>> Indeed, This current grapheme* functions seems inconsistent.
>>
>> Therefore, it's one thing when returns null, throws any exception.
>> Shall we do so just for the grapheme_str_split function?
>>
>> Regards
>> Yuya
>>
>> --
>> ---
>> Yuya Hamada (tekimen)
>> - https://tekitoh-memdhoi.info
>> - https://github.com/youkidearitai
>> -
> 
> Ah, If throws exception when intl_error*, is required other an RFC?
> If we make grapheme_str_split's signature is below (include null):

Hi Yuya

If you want to change other grapheme functions with respect to error handling, 
then it requires a separate RFC.
I think consistency between the functions is important.

> 
> ```
> function grapheme_str_split(string $string, int $length = 1): array|null {}
> ```
> 
> For now, I change signature to `array|null`.

Most others functions return false on error, so I think it should be 
array|false, and the implementation should use RETURN_FALSE instead of 
RETURN_NULL.

> 
> Regards
> Yuya
> 
> 

Kind regards
Niels


Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function

2024-03-05 Thread youkidearitai
2024年3月6日(水) 9:22 youkidearitai :
>
> Hi, Larry
> Hi, Niels
>
> 2024年3月6日(水) 6:47 Niels Dossche :
> >
> > Hi Larry
> > Hi Yuya
> >
> > So first of all, I meant the error handling in cases like these: 
> > https://github.com/php/php-src/pull/13580/files#diff-b8fe038d9d7539694593978bea5605f38dde4bcb6a016865130590e45e23202eR852-R860
> > The implementation still returns NULL here, so the signature is still 
> > incorrect. Either it should return false to match the other functions, or 
> > throw something and not return a value.
> >
> > On 05/03/2024 18:40, Larry Garfield wrote:
> > > On Tue, Mar 5, 2024, at 7:25 AM, youkidearitai wrote:
> > >> 2024年3月5日(火) 5:52 Niels Dossche :
> > >>>
> > >>> Hi Yuya
> > >>>
> > >>> This sounds useful.
> > >>>
> > >>> I do have a question about the function signature:
> > >>> function grapheme_str_split(string $string, int $length = 1): array {}
> > >>>
> > >>> This always returns an array.
> > >>> However, looking at your PR it seems you return NULL on failure, but 
> > >>> the return type in the signature isn't nullable.
> > >>> Also, from a quick look, it seems other functions return false instead 
> > >>> of null on failure. So perhaps the return type should be array|false.
> > >>>
> > >>> What do you think? :)
> > >>>
> > >>> Kind regards
> > >>> Niels
> > >>>
> > >>> On 03/03/2024 00:21, youkidearitai wrote:
> >  Hi, Internals
> > 
> >  I noticed PHP does not have grapheme cluster for str_split function.,
> >  Until now, you had to use the PCRE function's \X.
> > 
> >  Therefore, I try create `grapheme_str_split` function.
> >  https://github.com/php/php-src/pull/13580
> >  It is possible to convert array per emoji and variation selectors 
> >  using ICU.
> > 
> >  If it's fine, I'll create an RFC.
> > 
> >  Regards
> >  Yuya
> > 
> > >>
> > >> Hi, Niels
> > >>
> > >> Thank you for your comment.
> > >> Indeed, returns false is make sense.
> > >>
> > >> Therefore, I changed to returns false when invalid UTF-8 strings.
> > >>
> > >> Regards
> > >> Yuya
> > >
> > > Many legacy functions return false on error, but that is widely regarded 
> > > as bad design.  Please do not continue bad design.
> >
> > I agree that returning false on error isn't ideal for exceptional cases, 
> > that's what exceptions are for.
> > Looking at the other grapheme functions makes me wonder though how 
> > consistent this would be, especially w.r.t. intl_get_error_*() and 
> > intl_error_name().
> >
> > >
> > > Right now, the best "standard" error handling mechanism available is 
> > > exceptions.  false (or null) can very easily lead to incorrectly using 
> > > that value as though it were valid, when it's not, which will sometimes 
> > > cause a fatal error and sometimes cause a security leak.
> > >
> > > If the input value cannot be logically processed, that's an exception.  
> > > (Or Error, perhaps.)
> > >
> > > --Larry Garfield
> >
> > Kind regards
> > Niels
>
> Thank you so much for advice.
> Indeed, This current grapheme* functions seems inconsistent.
>
> Therefore, it's one thing when returns null, throws any exception.
> Shall we do so just for the grapheme_str_split function?
>
> Regards
> Yuya
>
> --
> ---
> Yuya Hamada (tekimen)
> - https://tekitoh-memdhoi.info
> - https://github.com/youkidearitai
> -

Ah, If throws exception when intl_error*, is required other an RFC?
If we make grapheme_str_split's signature is below (include null):

```
function grapheme_str_split(string $string, int $length = 1): array|null {}
```

For now, I change signature to `array|null`.

Regards
Yuya


-- 
---
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-


Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function

2024-03-05 Thread youkidearitai
Hi, Larry
Hi, Niels

2024年3月6日(水) 6:47 Niels Dossche :
>
> Hi Larry
> Hi Yuya
>
> So first of all, I meant the error handling in cases like these: 
> https://github.com/php/php-src/pull/13580/files#diff-b8fe038d9d7539694593978bea5605f38dde4bcb6a016865130590e45e23202eR852-R860
> The implementation still returns NULL here, so the signature is still 
> incorrect. Either it should return false to match the other functions, or 
> throw something and not return a value.
>
> On 05/03/2024 18:40, Larry Garfield wrote:
> > On Tue, Mar 5, 2024, at 7:25 AM, youkidearitai wrote:
> >> 2024年3月5日(火) 5:52 Niels Dossche :
> >>>
> >>> Hi Yuya
> >>>
> >>> This sounds useful.
> >>>
> >>> I do have a question about the function signature:
> >>> function grapheme_str_split(string $string, int $length = 1): array {}
> >>>
> >>> This always returns an array.
> >>> However, looking at your PR it seems you return NULL on failure, but the 
> >>> return type in the signature isn't nullable.
> >>> Also, from a quick look, it seems other functions return false instead of 
> >>> null on failure. So perhaps the return type should be array|false.
> >>>
> >>> What do you think? :)
> >>>
> >>> Kind regards
> >>> Niels
> >>>
> >>> On 03/03/2024 00:21, youkidearitai wrote:
>  Hi, Internals
> 
>  I noticed PHP does not have grapheme cluster for str_split function.,
>  Until now, you had to use the PCRE function's \X.
> 
>  Therefore, I try create `grapheme_str_split` function.
>  https://github.com/php/php-src/pull/13580
>  It is possible to convert array per emoji and variation selectors using 
>  ICU.
> 
>  If it's fine, I'll create an RFC.
> 
>  Regards
>  Yuya
> 
> >>
> >> Hi, Niels
> >>
> >> Thank you for your comment.
> >> Indeed, returns false is make sense.
> >>
> >> Therefore, I changed to returns false when invalid UTF-8 strings.
> >>
> >> Regards
> >> Yuya
> >
> > Many legacy functions return false on error, but that is widely regarded as 
> > bad design.  Please do not continue bad design.
>
> I agree that returning false on error isn't ideal for exceptional cases, 
> that's what exceptions are for.
> Looking at the other grapheme functions makes me wonder though how consistent 
> this would be, especially w.r.t. intl_get_error_*() and intl_error_name().
>
> >
> > Right now, the best "standard" error handling mechanism available is 
> > exceptions.  false (or null) can very easily lead to incorrectly using that 
> > value as though it were valid, when it's not, which will sometimes cause a 
> > fatal error and sometimes cause a security leak.
> >
> > If the input value cannot be logically processed, that's an exception.  (Or 
> > Error, perhaps.)
> >
> > --Larry Garfield
>
> Kind regards
> Niels

Thank you so much for advice.
Indeed, This current grapheme* functions seems inconsistent.

Therefore, it's one thing when returns null, throws any exception.
Shall we do so just for the grapheme_str_split function?

Regards
Yuya

-- 
---
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-


Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function

2024-03-05 Thread Niels Dossche
Hi Larry
Hi Yuya

So first of all, I meant the error handling in cases like these: 
https://github.com/php/php-src/pull/13580/files#diff-b8fe038d9d7539694593978bea5605f38dde4bcb6a016865130590e45e23202eR852-R860
The implementation still returns NULL here, so the signature is still 
incorrect. Either it should return false to match the other functions, or throw 
something and not return a value.

On 05/03/2024 18:40, Larry Garfield wrote:
> On Tue, Mar 5, 2024, at 7:25 AM, youkidearitai wrote:
>> 2024年3月5日(火) 5:52 Niels Dossche :
>>>
>>> Hi Yuya
>>>
>>> This sounds useful.
>>>
>>> I do have a question about the function signature:
>>> function grapheme_str_split(string $string, int $length = 1): array {}
>>>
>>> This always returns an array.
>>> However, looking at your PR it seems you return NULL on failure, but the 
>>> return type in the signature isn't nullable.
>>> Also, from a quick look, it seems other functions return false instead of 
>>> null on failure. So perhaps the return type should be array|false.
>>>
>>> What do you think? :)
>>>
>>> Kind regards
>>> Niels
>>>
>>> On 03/03/2024 00:21, youkidearitai wrote:
 Hi, Internals

 I noticed PHP does not have grapheme cluster for str_split function.,
 Until now, you had to use the PCRE function's \X.

 Therefore, I try create `grapheme_str_split` function.
 https://github.com/php/php-src/pull/13580
 It is possible to convert array per emoji and variation selectors using 
 ICU.

 If it's fine, I'll create an RFC.

 Regards
 Yuya

>>
>> Hi, Niels
>>
>> Thank you for your comment.
>> Indeed, returns false is make sense.
>>
>> Therefore, I changed to returns false when invalid UTF-8 strings.
>>
>> Regards
>> Yuya
> 
> Many legacy functions return false on error, but that is widely regarded as 
> bad design.  Please do not continue bad design.

I agree that returning false on error isn't ideal for exceptional cases, that's 
what exceptions are for.
Looking at the other grapheme functions makes me wonder though how consistent 
this would be, especially w.r.t. intl_get_error_*() and intl_error_name().

> 
> Right now, the best "standard" error handling mechanism available is 
> exceptions.  false (or null) can very easily lead to incorrectly using that 
> value as though it were valid, when it's not, which will sometimes cause a 
> fatal error and sometimes cause a security leak.  
> 
> If the input value cannot be logically processed, that's an exception.  (Or 
> Error, perhaps.)
> 
> --Larry Garfield

Kind regards
Niels


Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function

2024-03-05 Thread Larry Garfield
On Tue, Mar 5, 2024, at 7:25 AM, youkidearitai wrote:
> 2024年3月5日(火) 5:52 Niels Dossche :
>>
>> Hi Yuya
>>
>> This sounds useful.
>>
>> I do have a question about the function signature:
>> function grapheme_str_split(string $string, int $length = 1): array {}
>>
>> This always returns an array.
>> However, looking at your PR it seems you return NULL on failure, but the 
>> return type in the signature isn't nullable.
>> Also, from a quick look, it seems other functions return false instead of 
>> null on failure. So perhaps the return type should be array|false.
>>
>> What do you think? :)
>>
>> Kind regards
>> Niels
>>
>> On 03/03/2024 00:21, youkidearitai wrote:
>> > Hi, Internals
>> >
>> > I noticed PHP does not have grapheme cluster for str_split function.,
>> > Until now, you had to use the PCRE function's \X.
>> >
>> > Therefore, I try create `grapheme_str_split` function.
>> > https://github.com/php/php-src/pull/13580
>> > It is possible to convert array per emoji and variation selectors using 
>> > ICU.
>> >
>> > If it's fine, I'll create an RFC.
>> >
>> > Regards
>> > Yuya
>> >
>
> Hi, Niels
>
> Thank you for your comment.
> Indeed, returns false is make sense.
>
> Therefore, I changed to returns false when invalid UTF-8 strings.
>
> Regards
> Yuya

Many legacy functions return false on error, but that is widely regarded as bad 
design.  Please do not continue bad design.

Right now, the best "standard" error handling mechanism available is 
exceptions.  false (or null) can very easily lead to incorrectly using that 
value as though it were valid, when it's not, which will sometimes cause a 
fatal error and sometimes cause a security leak.  

If the input value cannot be logically processed, that's an exception.  (Or 
Error, perhaps.)

--Larry Garfield


Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function

2024-03-05 Thread youkidearitai
>
> Hi, Niels
>
> Thank you for your comment.
> Indeed, returns false is make sense.
>
> Therefore, I changed to returns false when invalid UTF-8 strings.
>
> Regards
> Yuya
>
> --
> ---
> Yuya Hamada (tekimen)
> - https://tekitoh-memdhoi.info
> - https://github.com/youkidearitai
> -

Sorry, again.
I checked behavior of mb_str_split function. So Illegal byte sequences
are returned as is.

```
sapi/cli/php -r 'var_dump(mb_str_split("あ\xc2\xf4\x80あ"));'
array(4) {
  [0]=>
  string(3) "あ"
  [1]=>
  string(2) "��"
  [2]=>
  string(1) "�"
  [3]=>
  string(3) "あ"
}
```

And, I reading ICU document about utext_openUTF8 (below is link):
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utext_8h.html#a130e7cba201c4b38799b432eb269f6d5

> Any invalid UTF-8 in the input will be handled in this way: a sequence of 
> bytes that has the form of a truncated, but otherwise valid, UTF-8 sequence 
> will be replaced by a single unicode replacement character, \uFFFD. Any other 
> illegal bytes will each be replaced by a \uFFFD.

Therefore, I think encoding check is not need.
Returns only arrays together with mb_str_split.

Regards
Yuya


-- 
---
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-


Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function

2024-03-04 Thread youkidearitai
2024年3月5日(火) 5:52 Niels Dossche :
>
> Hi Yuya
>
> This sounds useful.
>
> I do have a question about the function signature:
> function grapheme_str_split(string $string, int $length = 1): array {}
>
> This always returns an array.
> However, looking at your PR it seems you return NULL on failure, but the 
> return type in the signature isn't nullable.
> Also, from a quick look, it seems other functions return false instead of 
> null on failure. So perhaps the return type should be array|false.
>
> What do you think? :)
>
> Kind regards
> Niels
>
> On 03/03/2024 00:21, youkidearitai wrote:
> > Hi, Internals
> >
> > I noticed PHP does not have grapheme cluster for str_split function.,
> > Until now, you had to use the PCRE function's \X.
> >
> > Therefore, I try create `grapheme_str_split` function.
> > https://github.com/php/php-src/pull/13580
> > It is possible to convert array per emoji and variation selectors using ICU.
> >
> > If it's fine, I'll create an RFC.
> >
> > Regards
> > Yuya
> >

Hi, Niels

Thank you for your comment.
Indeed, returns false is make sense.

Therefore, I changed to returns false when invalid UTF-8 strings.

Regards
Yuya

-- 
---
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-


Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function

2024-03-04 Thread Niels Dossche
Hi Yuya

This sounds useful.

I do have a question about the function signature:
function grapheme_str_split(string $string, int $length = 1): array {}

This always returns an array.
However, looking at your PR it seems you return NULL on failure, but the return 
type in the signature isn't nullable.
Also, from a quick look, it seems other functions return false instead of null 
on failure. So perhaps the return type should be array|false.

What do you think? :)

Kind regards
Niels

On 03/03/2024 00:21, youkidearitai wrote:
> Hi, Internals
> 
> I noticed PHP does not have grapheme cluster for str_split function.,
> Until now, you had to use the PCRE function's \X.
> 
> Therefore, I try create `grapheme_str_split` function.
> https://github.com/php/php-src/pull/13580
> It is possible to convert array per emoji and variation selectors using ICU.
> 
> If it's fine, I'll create an RFC.
> 
> Regards
> Yuya
>