Re: Request for Review : CR#6924259: Remove String.count/String.offset

Mike Duigou Fri, 25 May 2012 15:10:06 -0700

On May 25 2012, at 06:57 , Rémi Forax wrote:

> Hi Mike, Hi Alan, Hi all,
> in my opinion, EMPTY_STRING_VALUE is a premature optimization,
> the idea is, I think, to avoid to create an empty char but if you want to do 
> that,
> it's better to do that in StringBuilder.toString() to avoid to create the 
> String at all.


I have removed most of the use of EMPTY_STRING_VALUE. It remains for the empty 
constructor only.

> 
> I'm worried about one pathological case where a lot of non empty String
> will be created and then one empty will be created after the code will be 
> JITed,
> in that case Hotspot will deoptimize all codes that have inlined the 
> constructor call.

Reasonable and confirmed with Valdimir Kozlov.

> String(StringBuilder) and String(StringBuffer) can be simplified
> by using length() and getValue() (AbstractStringBuilder.getValue())
> to extract the info.
>  this.value = Arrays.copyOf(builder.getValue(), builder.length());
> and avoid to create a temporary String.

Good optimization. I had to add a sync around the buffer version.

> 
> String(char value[], boolean share) should be
> String(char[] value, boolean share) and is there a perf reason to
> not use a static method like createSharedString() instead.

none. I will leave for a later change.

> 
> Removing String(int offset, int count, char value[]) will create trouble,
> I've seen several libraries that use it by reflection to avoid to create
> a copy of the array. I think this method should not disappear
> but use Arrays.copyOfRange if the offset is not 0.
> This method should be marked deprecated, and removed in Java 9.

Restored but deprecated.

> 
> Recently, _getChars(char[], int) was replaced by
> getChars(char[],int). It was a stupid change because now
> one can think that getChar(char[], int) and getChar(int,int,char[],int)
> do the same things. but getChat(char[],int) don't do any bounds check.
> So concat() and toCharArray() doesn't do any bound check !

I don't see where either of them need to do any checking.

> toCharArray() should use Arrays.copyOf()

Done.

> Overloads of encode in StringCoding are in my opinion not necessary
> because each method is called once.

Reasonable. I opted not to add the 3rd variant ([] count) because it wasn't 
used. I've reverted to using the 3 param version. 

> So this doesn't really share code
> but just add one level to the depth of the call stack.
> 
> cheers,
> Rémi
> 
> On 05/25/2012 02:08 PM, Alan Bateman wrote:
>> On 24/05/2012 21:45, Mike Duigou wrote:
>>> Hello all;
>>> 
>>> For a long time preparations and planing have been underway to remove the 
>>> offset and count fields from java.lang.String. These two fields enable 
>>> multiple String instances to share the same backing character buffer. 
>>> Shared character buffers were an important optimization for old benchmarks 
>>> but with current real world code and benchmarks it's actually better to not 
>>> share backing buffers. Shared char array backing buffers only "win" with 
>>> very heavy use of String.substring. The negatively impacted situations can 
>>> include parsers and compilers however current testing shows that overall 
>>> this change is beneficial.
>>> 
>>> The current plan is to commit this change to jdk8 before build 40 followed 
>>> by jdk7u6. Removing offset/count is a complicated process because HotSpot 
>>> has special case code for String that depends upon the two fields. Vladimir 
>>> Kozlov has committed the required HotSpot changes already 
>>> (http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/8f972594effc).
>>> 
>>> A patch of the changes to java.lang.String for JDK 8 are 
>>> at<http://cr.openjdk.java.net/~mduigou/6924259/0/webrev/>. The changes for 
>>> JDK 7 have only superficial differences (line offsets in the patch).
>>> 
>>> Comments are welcome.
>>> 
>>> Mike
>> I went through the latest webrev and don't see anything obviously wrong. 
>> EMPTY_STRING_VALUE is new, at least to me, as I don't think it was there 
>> when removing these fields was prototyped a few years ago.
>> 
>> A minor point in the String(char[]) constructor is that you could move the 
>> "int len = .." to the top of the method to be consistent with other places 
>> where a local holds the value.length.
>> 
>> The "use name version for caching" comment in StringCoding.java might be 
>> confusing to readers, maybe "use charset name as this is faster due to 
>> caching".
>> 
>> A non-material comment is that there are a couple of style changes that I 
>> found annoying. Everyone is an expert on such matters, I'm not, but the one 
>> that bugged me a bit was adding the space after the cast, eg:
>> (toffset > (long) value.length - len)
>> I found myself needing to re-read it to see if the cast applied to 
>> value.length or value.length-len. It's a minor point but you know what I 
>> mean.
>> 
>> -Alan.
>

Re: Request for Review : CR#6924259: Remove String.count/String.offset

Reply via email to