Re: RFR: 8197594 - String and character repeat

James Laskey Sun, 18 Feb 2018 01:39:09 -0800

Didn’t I hear someone mentioning “\U1D11A” at some point?

Sent from my iPhone


> On Feb 18, 2018, at 1:10 AM, Stuart Marks <stuart.ma...@oracle.com> wrote:
> 
> Fair enough. I'll be less unhappy if there is a way to convert from a code 
> point to a String, as requested by JDK-4993841. This will reduce
> 
>    new String(Character.toChars(codepoint)).repeat(count)
> 
> to
> 
>    Character.toString(codepoint).repeat(count)
> 
> But this is still fairly roundabout. Since most cases are constants, the 
> advice is to use a string literal instead of a char literal. This works for 
> BMP characters, e.g. "-".repeat(10) or "\u2501".repeat(15). But if I want a 
> non-BMP character as a string literal, I have encode it into a surrogate pair 
> myself. For example, a string literal containing the character U+1D11A 
> MUSICAL SYMBOL FIVE-LINE STAFF would be "\uD834\uDD1A". Ugh! Or, I could just 
> call a function and live with it not being a constant. It would be nice if 
> there were an escape sequence that allowed any Unicode code point, including 
> supplementary characters, to be put to n a string literal.
> 
> s'marks
> 
>> On Feb 16, 2018, at 18:02, Brian Goetz <brian.go...@oracle.com> wrote:
>> 
>> Disagree.  
>> 
>> On #3, most of the time the char being repeated is already a literal.  So 
>> just make it a string.  
>> 
>> On #2, better to aim for string.ofCodePoint(int) and compose w repeat.  
>> 
>> Down to one method again :)
>> 
>> Sent from my MacBook Wheel
>> 
>>> On Feb 16, 2018, at 5:13 PM, Stuart Marks <stuart.ma...@oracle.com> wrote:
>>> 
>>> Let me put in an argument for handling code points:
>>> 
>>>> 3. public static String repeat(final int codepoint, final int count)
>>> 
>>> Most of the String and Character API handles code points on an equal 
>>> footing with chars. I think this is important, as over time Unicode is 
>>> continuing to add supplementary characters -- those that can't be 
>>> represented in a Java char value. Examples abound of how such characters 
>>> are mishandled. Therefore, I believe Java APIs should have full support for 
>>> code points.
>>> 
>>> This is a small thing, and some might consider it a rare case -- how often 
>>> does one need to repeat something like an emoji? The issue however isn't 
>>> that particular use case. Instead what's required is the ability to handle 
>>> *any Unicode character* uniformly, regardless of whether or not it's a 
>>> supplementary character. The way to do that is to deal with code points, so 
>>> any Java API that deals with character data must also handle code points.
>>> 
>>> If we were to add just one method:
>>> 
>>>> 1. public String repeat(final int count)
>>> 
>>> the workaround is to take the character, turn it into a string, and call 
>>> the repeat() method on it. For a 'char' value, this isn't too bad, but I'd 
>>> argue it isn't pretty either:
>>> 
>>>  Character.toString(charVal).repeat(n)
>>> 
>>> But this only handles BMP characters, not supplementary characters. 
>>> Unfortunately, there's no direct way to turn a code point into a string -- 
>>> you have to turn it into a byte array first! Thus, to get a string from a 
>>> code point and repeat it, you have to do this:
>>> 
>>>  new String(Character.toChars(codepoint)).repeat(count)
>>> 
>>> This is enough indirection that it's hard to discover, and I suspect that 
>>> most people won't put in the effort to do this correctly, resulting in more 
>>> code that mishandles supplementary characters.
>>> 
>>> Thus, I think we need to add API #3 that performs the repeat function on 
>>> code points.
>>> 
>>> (Hm, the lack of Character.toString(codepoint) is covered by JDK-4993841, 
>>> which is closed. I think I'll reopen it.)
>>> 
>>>> 2. public static String repeat(final char ch, final int count)
>>> 
>>> I can see that this API is not as important as one that handles code 
>>> points, and it seems to be less frequently used according to Louis W's 
>>> analysis. But if you have char data you want to repeat, not having this 
>>> seems like an omission; it seems backwards to have to create a string from 
>>> the char, only for repeat() to extract that char from that String in order 
>>> to repeat it. Thus I've vote for inclusion of this method as well.
>>> 
>>> s'marks
>>> 
>>> 
>>>> On 2/16/18 5:10 AM, Jim Laskey wrote:
>>>> We’re going with the one instance method (Louis clinched it.) with 
>>>> recommended enhancements and not touching CharSequence.
>>>> Working it up now.
>>>> — Jim
>>>>> On Feb 16, 2018, at 7:46 AM, Alan Bateman <alan.bate...@oracle.com> wrote:
>>>>> 
>>>>> On 15/02/2018 17:20, Jim Laskey wrote:
>>>>>> This is a pre-CSR code review [1] for String repeat methods 
>>>>>> (Enhancement).
>>>>>> 
>>>>>> The proposal is to introduce four new methods;
>>>>>> 
>>>>>> 1. public String repeat(final int count)
>>>>>> 2. public static String repeat(final char ch, final int count)
>>>>>> 3. public static String repeat(final int codepoint, final int count)
>>>>>> 4. public static String repeat(final CharSequence seq, final int count)
>>>>>> 
>>>>> Just catching up on this thread and it's hard to see where the bidding is 
>>>>> currently at. Are you planning to send an updated proposal, a list of 
>>>>> methods is fine, even if it's just one, is okay (implementation can 
>>>>> follow later).
>>>>> 
>>>>> -Alan
>>> 
>> 
>

Re: RFR: 8197594 - String and character repeat

Reply via email to