Re: Lower overhead String encoding/decoding

Xueming Shen Thu, 25 Sep 2014 09:25:56 -0700

Hi Richard, couple comments after a quick scan.

(1) #474, the IOBE probably is no longer needed in case of ByteBuffer.parameter.

(2) for methods that have the ByteBuffer as input, it would be desirableto specify clearly thatthe bytes are read from "position" to "limit", and whether or notthe "position" will be advanced

     after decoding/encoding (important).

(3) getBytes(byte[], offset, charset)

while I understand it might be useful to have such a method incertain circumstance, it isusually complicated to make it right/easy for user to actually useit. Consider the fact thatthe returned number of bytes encoded has no straightforwardconnection to how manyunderlying chars have been encoded (so the user of this methodreally have no idea howmany underlying "chars" have been encoded into the dest buffer, isthat enough? how bigthe buffer need to be to encode the whole string? ....) especiallythe possibility that the lastcouple byte space might be short of encoding a "char". Not likethe getChars(), which hasa easy, clear and direct link between the out going chars andunderlying chars. I would

     suggest it might be better to leave it out.

(4) StringCoding.decode() #239 "remaining()" should be used to returnlimit - position?

(5) in case of "untrusted", it might be more straightforward to get all"bytes" out of the bufferfirst (you are allocating a byte buffer here anyway, I don;t seeobvious benefit to get adirect buffer, btw) and then pass it to the original/existingbyte[]->char[] decodingimplementation. We probably will take a deep look at theimplementation later when

     the public api settled.

-Sherman


Richard Warburton wrote:

Hi Alan,

Thanks for the feedback.

The direction seems reasonable but I wonder about the offset/length (and

destOffet) parameters as this isn't consistent with how ByteBuffers were
originally intended to be used. That is, when you read the bytes from the
wire into a ByteBuffer and flip it then the position and limit will delimit
the bytes to be decoded.

If the constructor is changed to String(ByteBuffer in, Charset cs) and
decodes the remaining bytes in the buffer to a String using the specified
Charset then I think would be more consistent. Also I think this would give
you a solution to the underflow case.

I've updated the webrev to reflect this, removing the offset and length
parameters and using position() and limit() instead.

http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/

Similarly if getBytes is replaced with with a getBytes or

encode(ByteBuffer, Charset cs) then then it would encode as many characters
as possible into the output buffer and I think would be more consistent and
also help with overflow case.

I've also applied the this to the getBytes() method. I chose the getBytes()
method name for consistency with the existing getBytes() method that
returns a byte[]. To my mind encode() is a more natural name for the
method, which you mention in your email, do people have a preference here?

regards,

   Richard Warburton

   http://insightfullogic.com
   @RichardWarburto <http://twitter.com/richardwarburto>

Re: Lower overhead String encoding/decoding

Reply via email to