On 06/01/2015 03:54 PM, Vitaly Davidovich wrote: > While it's true that the denser format will require fewer cachelines, my > experience is that most strings are smaller than a single cacheline > worth of storage, maybe two lines in some cases; there's just a ton of > them in the heap. So the heap footprint should be substantially > reduced, but I'm not sure the cache pollution will be significantly reduced.
This calculation assumes object allocations are granular to the cache lines. They are not: if String takes less space within the cache line, it allows *more* object data to be squeezed there. In other words, with compact Strings, the entire dataset can take less cache lines, thus improving performance. > There's currently no vectorization of char[] scanning (or any > vectorization other than memcpy for that matter) - are you referring to > the recent Intel contributions here or there's a plan to further improve > vectorization in time for this JEP? Just curious. String methods are intensely intrinsified (and vectorized in those implementations). String::equals, String::compareTo, and some encoding/decoding come to mind. I really, really invite you to read the collateral materials from the JEP, where we explored quite a few performance characteristics already. Thanks, -Aleksey.
