On May 25 2012, at 06:57 , Rémi Forax wrote: > Hi Mike, Hi Alan, Hi all, > in my opinion, EMPTY_STRING_VALUE is a premature optimization, > the idea is, I think, to avoid to create an empty char but if you want to do > that, > it's better to do that in StringBuilder.toString() to avoid to create the > String at all.
I have removed most of the use of EMPTY_STRING_VALUE. It remains for the empty constructor only. > > I'm worried about one pathological case where a lot of non empty String > will be created and then one empty will be created after the code will be > JITed, > in that case Hotspot will deoptimize all codes that have inlined the > constructor call. Reasonable and confirmed with Valdimir Kozlov. > String(StringBuilder) and String(StringBuffer) can be simplified > by using length() and getValue() (AbstractStringBuilder.getValue()) > to extract the info. > this.value = Arrays.copyOf(builder.getValue(), builder.length()); > and avoid to create a temporary String. Good optimization. I had to add a sync around the buffer version. > > String(char value[], boolean share) should be > String(char[] value, boolean share) and is there a perf reason to > not use a static method like createSharedString() instead. none. I will leave for a later change. > > Removing String(int offset, int count, char value[]) will create trouble, > I've seen several libraries that use it by reflection to avoid to create > a copy of the array. I think this method should not disappear > but use Arrays.copyOfRange if the offset is not 0. > This method should be marked deprecated, and removed in Java 9. Restored but deprecated. > > Recently, _getChars(char[], int) was replaced by > getChars(char[],int). It was a stupid change because now > one can think that getChar(char[], int) and getChar(int,int,char[],int) > do the same things. but getChat(char[],int) don't do any bounds check. > So concat() and toCharArray() doesn't do any bound check ! I don't see where either of them need to do any checking. > toCharArray() should use Arrays.copyOf() Done. > Overloads of encode in StringCoding are in my opinion not necessary > because each method is called once. Reasonable. I opted not to add the 3rd variant ([] count) because it wasn't used. I've reverted to using the 3 param version. > So this doesn't really share code > but just add one level to the depth of the call stack. > > cheers, > Rémi > > On 05/25/2012 02:08 PM, Alan Bateman wrote: >> On 24/05/2012 21:45, Mike Duigou wrote: >>> Hello all; >>> >>> For a long time preparations and planing have been underway to remove the >>> offset and count fields from java.lang.String. These two fields enable >>> multiple String instances to share the same backing character buffer. >>> Shared character buffers were an important optimization for old benchmarks >>> but with current real world code and benchmarks it's actually better to not >>> share backing buffers. Shared char array backing buffers only "win" with >>> very heavy use of String.substring. The negatively impacted situations can >>> include parsers and compilers however current testing shows that overall >>> this change is beneficial. >>> >>> The current plan is to commit this change to jdk8 before build 40 followed >>> by jdk7u6. Removing offset/count is a complicated process because HotSpot >>> has special case code for String that depends upon the two fields. Vladimir >>> Kozlov has committed the required HotSpot changes already >>> (http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/8f972594effc). >>> >>> A patch of the changes to java.lang.String for JDK 8 are >>> at<http://cr.openjdk.java.net/~mduigou/6924259/0/webrev/>. The changes for >>> JDK 7 have only superficial differences (line offsets in the patch). >>> >>> Comments are welcome. >>> >>> Mike >> I went through the latest webrev and don't see anything obviously wrong. >> EMPTY_STRING_VALUE is new, at least to me, as I don't think it was there >> when removing these fields was prototyped a few years ago. >> >> A minor point in the String(char[]) constructor is that you could move the >> "int len = .." to the top of the method to be consistent with other places >> where a local holds the value.length. >> >> The "use name version for caching" comment in StringCoding.java might be >> confusing to readers, maybe "use charset name as this is faster due to >> caching". >> >> A non-material comment is that there are a couple of style changes that I >> found annoying. Everyone is an expert on such matters, I'm not, but the one >> that bugged me a bit was adding the space after the cast, eg: >> (toffset > (long) value.length - len) >> I found myself needing to re-read it to see if the cast applied to >> value.length or value.length-len. It's a minor point but you know what I >> mean. >> >> -Alan. >