Re: More memory-efficient internal representation for Strings: call for more data

Xueming Shen Tue, 02 Dec 2014 16:36:42 -0800

On 12/02/2014 03:24 PM, Douglas Surber wrote:

String construction is a big performance issue for JDBC drivers. Most queries 
return some number of Strings. The overwhelming majority of those Strings will 
be short lived. The cost of constructing these Strings from network bytes is a 
large fraction of total execution time. Any increase in the cost of 
constructing a String will far out weigh any reduction in memory use, at least 
for query results.


All of the proposed compression methods require an additional scan of the 
entire string. That's exactly the wrong direction. Something like the following 
pseudo-code is common inside a driver.

  {
    char[] c = new char[n];
    for (i = 0; i < n; i++) c[i] = charSource.next();
    return new String(c);
  }


In most use cases, the char[] is a waste if the final target is a String object 
and the input data is
byte[].  Optimization had been implemented in StringCoding in early releases to 
avoid redundant
char[] copy if possible. For compressed String project, however, if the final 
internal storage is
byte[], it might be ideal to avoid the char[] in the byte[] -> char[] -> byte[] 
-> String path. The
"extra scan" can be combined into the byte[] -> char[] -> byte[] decoding phase.

I do have some optimization in StringCoding for asccii, 8859-1 and utf8 in case 
of the data is ascii
only,

http://cr.openjdk.java.net/~sherman/8054307/jdk/src/java.base/share/classes/java/lang/StringCoding.java.html

But the benefit might be limited, and the current implementation has a bias on 
"single byte"
use scenario, with the assumption that most of the String objects inside a live 
vm heap is single
byte string. More work need to/can be done here, if data shows this indeed is a 
big issue.

-Sherman


The array copy inside the String constructor is a significant fraction of JDBC 
driver execution time. Adding an additional scan on top of it is making things 
worse regardless of the transient benefit of more compact storage. In the case 
of a query result the String will be likely never be promoted out of new space; 
the benefit of compression would be minimal.

I don't dispute that Strings occupy a significant fraction of the heap or that 
a lot of those bytes are zero. And I certainly agree that reducing memory 
footprint is valuable, but any worsening of String construction time will 
likely be a problem.

Douglas

At 02:13 PM 12/2/2014, core-libs-dev-requ...@openjdk.java.net wrote:

Date: Wed, 03 Dec 2014 00:59:10 +0300
From: Aleksey Shipilev <aleksey.shipi...@oracle.com>
To: Java Core Libs <core-libs-dev@openjdk.java.net>
Cc: charlie hunt <charlie.h...@oracle.com>
Subject: More memory-efficient internal representation for Strings:
        call for        more data
Message-ID: <547e362e.5010...@oracle.com>
Content-Type: text/plain; charset=utf-8

Hi,

As you may already know, we are looking into more memory efficient
representation for Strings:
 https://bugs.openjdk.java.net/browse/JDK-8054307

As part of preliminary performance work for this JEP, we have to collect
the empirical data on usual characteristics of Strings and char[]-s
normal applications have, as well as figure out the early estimates for
the improvements based on that data. What we have so far is written up here:

http://cr.openjdk.java.net/~shade/density/string-density-report.pdf

We would appreciate if people who are interested in this JEP can provide
the additional data on their applications. It is double-interesting to
have the data for the applications that process String data outside
Latin1 plane. Our current data says these cases are rather rare. Please
read the current report draft, and try to process your own heap dumps
using the instructions in the Appendix.

Thanks,
-Aleksey.

Re: More memory-efficient internal representation for Strings: call for more data

Reply via email to