Re: More memory-efficient internal representation for Strings: call for more data

Xueming Shen Tue, 02 Dec 2014 16:54:04 -0800

On 12/02/2014 04:42 PM, Douglas Surber wrote:

The most common operation on most Strings in query results is to do nothing. 
Just construct the String, hold onto it while the rest of the transaction 
completes, then drop it on the floor. Probably the next most common is to 
encode the chars to write them to an OutputStream or send them back to the 
database. I'd be curious how a compact representation would help those 
operations.


It depends on what inside those "query results". If most of them are ascii, 
only a small portion
are double byted user data (for example, it is true for most of the "utf8" xml 
files), you might
be able to save the cpu time/throughput by only copying half length of the 
bytes around their
life circle, especially "copy around" is the only operation they are carrying 
on.

-Sherman

SPECjEnterprise is a widely used standard benchmark. It probably uses mostly 
(or even entirely) ASCII characters so it's not representative of many 
customers.

My definition of "sane limits" might be different than yours. As far as I'm 
concerned String construction is already too slow and should be made faster by 
eliminating the char[] copy when possible.

Douglas

At 03:47 PM 12/2/2014, Aleksey Shipilev wrote:

Hi Douglas,

On 12/03/2014 02:24 AM, Douglas Surber wrote:
> String construction is a big performance issue for JDBC drivers. Most
> queries return some number of Strings. The overwhelming majority of
> those Strings will be short lived. The cost of constructing these
> Strings from network bytes is a large fraction of total execution time.
> Any increase in the cost of constructing a String will far out weigh any
> reduction in memory use, at least for query results.

You will also have to take into the account that shorter (compressed)
Strings allow for more efficient operations on them. This is not to
mention the GC costs are also usually "hidden" from the naive
performance estimations: even though you can perceive the mutator is
spending more time doing work, that might be offset by easier job for GC.

> All of the proposed compression methods require an additional scan of
> the entire string. That's exactly the wrong direction. Something like
> the following pseudo-code is common inside a driver.
>
>   {
>     char[] c = new char[n];
>     for (i = 0; i < n; i++) c[i] = charSource.next();
>     return new String(c);
>   }

Good to know. We will be assessing the String(char[]) construction
performance in the course of this performance work. What would you say
is a characteristic high-level benchmark for the scenario you are
describing?

> The array copy inside the String constructor is a significant fraction
> of JDBC driver execution time. Adding an additional scan on top of it is
> making things worse regardless of the transient benefit of more compact
> storage. In the case of a query result the String will be likely never
> be promoted out of new space; the benefit of compression would be minimal.

It's hard to say at this point. We want to understand what footprint
improvements we are talking about. I agree that if cost-benefit analysis
will say the performance is degrading beyond the sane limits even if we
are happy with memory savings, there is little reason to push this into
the general JDK.

Thanks,
-Aleksey

Re: More memory-efficient internal representation for Strings: call for more data

Reply via email to