On 04/07/2015 11:10 PM, Louis Wasserman wrote:
Could that actually be provided immutability-safely? I suppose an append-only, fixed-length builder would be potentially safe.

I think the presented implementation *is* thread and immutability safe. It only allows mutations from the constructing thread and disables them as soon as toString() is called.


Part of the trickiness there is with primitive parameters, where presizing and doing the actual append both require calculating the size of the primitive when converted to a string. The current approach just uses an upper bound for the primitive type as a whole, which is fine and allocates a small constant excess in most cases, but I'm not sure how we could avoid duplicating that computation in a public API.

I've been thinking of static methods parallel to:

String.valueOf(int) - String.lengthOf(int)
String.valueOf(long) - String.lengthOf(long)
...
...


The integral variants could be quick to compute (the implementations are already there - they are just not public), the floating point maybe not so. But the floating point conversion is already taking much more overhead and estimating an upper bound would be fine in case floating point is involved. If one wants to optimize the case of very long concatenation of exactly sized Strings, integrals and other Objects with just one or few floating point values, then he can always do it explicitly:

"....long string...." + String.valueOf(Math.PI);

Regards, Peter



On Tue, Apr 7, 2015 at 2:07 PM Peter Levart <peter.lev...@gmail.com <mailto:peter.lev...@gmail.com>> wrote:

    Hi Louis,

    This is nice. Amd could be even nicer. In case the estimated
    initial StringBuilder capacity is exactly the final String length,
    then constructing the String could skip the char[] copying step as
    the StringBuilder instance does not escape. But in order to be
    safe, this would have to be a special kind of StringBuilder. Like
    the following:

    http://cr.openjdk.java.net/~plevart/misc/ThreadLocalStringBuilder/webrev.01/
    
<http://cr.openjdk.java.net/%7Eplevart/misc/ThreadLocalStringBuilder/webrev.01/>

    Such class would be useful for direct API use too.


    Regards, Peter


    On 03/13/2015 10:40 PM, Louis Wasserman wrote:
    Got it.  I think the only cases we have to worry about, then, are
    buffer size overflows resulting in NegativeArraySizeException, or
    possibly an explicitly thrown OutOfMemoryError (which is
    StringBuilder's response when the buffer size tries to exceed
    Integer.MAX_VALUE).  I think we might conceivably deal with this
    by rewriting the bytecode to -- I think we can improve on this
    with jump hackery to avoid repetition, but essentially --

    int length = 3; // sum of the constant strings; we can check that
    this won't overflow at compile time but I think it couldn't no
    matter what because of method length constraints
    String mStr = m().toString();
    length += mStr.length();
    if (length < 0) {
      throw new OutOfMemoryError();
    }
    String nStr = n().toString();
    length += nStr.length();
    if (length < 0) {
      throw new OutOfMemoryError();
    }

    This continues to expand the bytecode, but probably manageably --
    we don't actually need a local for length; if (int < 0) is easy
    in bytecode, and we can have only one OOME site that all the ifs
    jump to?

    On Fri, Mar 13, 2015 at 12:59 PM Alex Buckley
    <alex.buck...@oracle.com <mailto:alex.buck...@oracle.com>> wrote:

        I do recognize that the proposed implementation doesn't
        reorder the
        evaluation of subexpressions.

        When discussing the proposed implementation of '+' -- whose
        key element
        is calling append(String) with a pre-computed value -- I was
        careful to
        set aside asynchronous OOMEs, but I see that even synchronous
        OOMEs are
        sidetracking us. My concern is not heap pressure; my concern is
        arbitrary unchecked exceptions arising from the <init>(int) and
        append(String) calls.

        For sake of argument, I'll simplify "unchecked exceptions" to
        just
        RuntimeExceptions, not Errors. If you can guarantee that no
        RuntimeExceptions are thrown synchronously during the
        execution of those
        method bodies on the JVM, then '+' cannot fail and the timing of
        subexpression evaluation is unobservable (ordering is still
        observable,
        as required). I think this guarantee is just a matter of
        reviewing the
        method bodies.

        Alex

        On 3/12/2015 6:01 PM, Louis Wasserman wrote:
        > I confess I'm not sure how that "quality" goal would be
        achievable in
        > bytecode without deliberately allocating arrays we then
        discard.
        >
        > For what it's worth, delaying or avoiding OOMEs seems a
        desirable goal
        > in general, and up to a constant multiplicative factor, this
        > implementation seems to allocate the same amount in the
        same order.
        > That is, we're still computing m().toString() before
        n().toString(), and
        > up to a constant multiplicative factor, m().toString()
        allocates the
        > same number of bytes as the StringBuilder the status quo
        generates.  So
        > if m() does something like allocate a
        char[Integer.MAX_VALUE], we still
        > OOM at the appropriate time.
        >
        > Other notes: this implementation would tend to decrease maximum
        > allocation, so it'd reduce OOMEs.  Also, since the
        StringBuilder will
        > never need further expansion and we're only using the
        String and
        > primitive overloads of append, the only way for append to
        OOME would be
        > in append(float) and append(double), which allocate a
        FloatingDecimal
        > (which may, in turn, allocate a new thread-local char[26]
        if one isn't
        > already there).
        >
        > On Thu, Mar 12, 2015 at 4:28 PM Alex Buckley
        <alex.buck...@oracle.com <mailto:alex.buck...@oracle.com>
        > <mailto:alex.buck...@oracle.com
        <mailto:alex.buck...@oracle.com>>> wrote:
        >
        >     More abstract presentation. Given the expression:
        >
        >         "foo" + m() + n()
        >
        >     you must not evaluate n() if evaluation of "foo" + m()
        completes
        >     abruptly. The proposed implementation evaluates n()
        regardless.
        >
        >     All is not lost. In the proposed implementation, the
        abrupt completion
        >     of "foo" + m() could occur because an append call fails
        or (thanks to
        >     Jon for pointing this out) the StringBuilder ctor
        fails. The
        >     quality-of-implementation issue is thus: if the
        proposed implementation
        >     is of sufficiently high quality to guarantee that the
        ctor and the first
        >     append both succeed, then the evaluation of "foo" + m()
        will always
        >     complete normally, and it would be an unobservable
        (thus acceptable)
        >     implementation detail to evaluate n() early.
        >
        >     Alex
        >
        >     On 3/11/2015 10:26 PM, Jeremy Manson wrote:
        >      > Isn't Louis's proposed behavior equivalent to saying
        "the rightmost
        >      > concatenation threw an OOME" instead of "some
        concatenation in the
        >      > middle threw an OOME"?
        >      >
        >      > It's true that the intermediate String
        concatenations haven't
        >     occurred
        >      > at that point, but in the JDK's current
        implementation, that's true,
        >      > too: the concatenations that have occurred at that
        point are
        >      > StringBuilder ones, not String ones. If any of the
        append operations
        >      > throws an OOME, no Strings have been created at all,
        either in
        >     Louis's
        >      > implementation or in the JDK's.
        >      >
        >      > Ultimately, isn't this a quality of implementation
        issue?  And if so,
        >      > isn't it a quality of implementation issue that
        doesn't provide any
        >      > additional quality?  I can't imagine code whose
        semantics relies on
        >      > this, and if they do, they are relying on something
        >      > implementation-dependent.
        >      >
        >      > Jeremy
        >      >
        >      > On Wed, Mar 11, 2015 at 6:13 PM, Alex Buckley
        >     <alex.buck...@oracle.com
        <mailto:alex.buck...@oracle.com>
        <mailto:alex.buck...@oracle.com <mailto:alex.buck...@oracle.com>>
        >      > <mailto:alex.buckley@oracle.
        <mailto:alex.buckley@oracle.>__com
        >     <mailto:alex.buck...@oracle.com
        <mailto:alex.buck...@oracle.com>>>> wrote:
        >      >
        >      >     On 3/11/2015 2:01 PM, Louis Wasserman wrote:
        >      >
        >      >         So for example, "foo" + myInt + myString +
        "bar" + myObj
        >     would be
        >      >         compiled to the equivalent of
        >      >
        >      >         int myIntTmp = myInt;
        >      >         String myStringTmp =
        String.valueOf(myString); // defend
        >     against
        >      >         null
        >      >         String myObjTmp =
        >     String.valueOf(String.valueOf(____myObj)); // defend
        >      >         against evil toString implementations
        returning null
        >      >
        >      >         return new StringBuilder(
        >      >                17 // length of "foo" (3) + max
        length of myInt (11) +
        >      >         length of
        >      >         "bar" (3)
        >      >                + myStringTmp.length()
        >      >                + myObjTmp.length())
        >      >              .append("foo")
        >      >              .append(myIntTmp)
        >      >              .append(myStringTmp)
        >      >              .append("bar")
        >      >              .append(myObjTmp)
        >      >              .toString();
        >      >
        >      >         As far as language constraints go, the JLS
        is (apparently
        >      >         deliberately)
        >      >         vague about how string concatenation is
        implemented.  "An
        >      >         implementation
        >      >         may choose to perform conversion and
        concatenation in one
        >     step
        >      >         to avoid
        >      >         creating and then discarding an intermediate
        String
        >     object. To
        >      >         increase
        >      >         the performance of repeated string
        concatenation, a Java
        >      >         compiler may
        >      >         use the StringBuffer class or a similar
        technique to
        >     reduce the
        >      >         number
        >      >         of intermediate String objects that are
        created by
        >     evaluation of an
        >      >         expression."  We see no reason this approach
        would not
        >     qualify as a
        >      >         "similar technique."
        >      >
        >      >
        >      >     The really key property of the string
        concatenation operator is
        >      >     left-associativity. Later subexpressions must not be
        >     evaluated until
        >      >     earlier subexpressions have been successfully
        evaluated AND
        >      >     concatenated. Consider this expression:
        >      >
        >      >        "foo" + m() + n()
        >      >
        >      >     which JLS8 15.8 specifies to mean:
        >      >
        >      >        ("foo" + m()) + n()
        >      >
        >      >     We know from JLS8 15.6 that if m() throws, then
        foo+m()
        >     throws, and
        >      >     n() will never be evaluated.
        >      >
        >      >     Happily, your translation doesn't appear to
        catch and swallow
        >      >     exceptions when eagerly evaluating each
        subexpression in
        >     turn, so I
        >      >     believe you won't evaluate n() if m() already threw.
        >      >
        >      >     Unhappily, a call to append(..) can in general
        fail with
        >      >     OutOfMemoryError. (I'm not talking about
        asynchronous
        >     exceptions in
        >      >     general, but rather the sense that append(..)
        manipulates the
        >     heap
        >      >     so an OOME is at least plausible.) In the OpenJDK
        >     implementation, if
        >      >     blah.append(m()) fails with OOME, then n()
        hasn't been
        >     evaluated yet
        >      >     -- that's "real" left-associativity. In the proposed
        >     implementation,
        >      >     it's possible that more memory is available when
        evaluating
        >     m() and
        >      >     n() upfront than at the time of an append call,
        so n() is
        >     evaluated
        >      >     even if append(<<tmp result of m()>>) fails --
        that's not
        >      >     left-associative.
        >      >
        >      >     Perhaps you can set my mind at ease that
        append(..) can't
        >     fail with
        >      >     OOME?
        >      >
        >      >     Alex
        >      >
        >      >
        >



Reply via email to