On Jun 12, 2017 10:47 AM, "arunvinudss" <g...@git.apache.org> wrote:
Github user arunvinudss commented on a diff in the pull request: I am a bit biased towards using String instead of CharSequence . Yes CharSequence allows us to pass String Buffers and builders and other types as input potentially increasing the scope of the function but considering the nature of work we do in this particular method it may not necessarily be a good idea. My basic contention is that the minute we call toString() on a charSequence to do any sort of manipulation it becomes a costly operation and we may lose performance . True if the particular CharSequence is not in fact an instance of String. String::toString returns this. The bigger problem is that too many methods use String as a parameter or return type, when CharSequence would serve just as well. This indeed requires the invocation of Object::toString. For methods that use String as the return type, changing the result to CharSequence is source and binary incompatible, and properly so (since at some point the user may actually need a String). A generic method with Type parameter with CharSequence as bound (T extends CharSequence) can sometimes be useful, and can be added in addition to methods taking String arguments, but can't replace them. There are some places in javac that have special treatment for String - for example, the + operator , but jdk9 reduces that particular win by indyfying concat. If a method doesn't intrinsically require a String, then I prefer CharSequence. It's probable that sooner or later something is going to demand a String, but that's not a good reason to be "that guy" :-) Note: Strings can be an incredible waste of memory; 40 + ⌈length/4⌉ bytes (reduced to a mere 40 + ⌈length/8⌉ bytes in jdk9 when compact strings can be used). This is incredibly painful if you have a vast number of small "strings", which may not all need to be materialized simultaneously. See e.g. [1] (~50MiB of UTF-8 chars becomes ~250MiB of Strings. And since there's no individual humongous object they all get to make the journey from TLAB to Old Space the hard way. Note this predates jdk 9,but illustrates some of the win from compact strings) Storing the character data in a shared byte array is a huge win. Someone should tell the jdk implementors to look at applications that do this. Like, um, javac :-) Materializing these strings as possibly transient CharSequence's is really convenient... until some method just has to have a String Also, wouldn't some sort of low-space-overhead string storage be a good fit for text? Simon [1] Spero,S. (2015). Time And Relative Dimensions In Semantics: Is OWL Bigger On The Inside? OWLED 2015. Available at http://cgi.csc.liv.ac.uk/~valli/OWLED2015/OWLED_2015_paper_12.pdf