public String(ByteBuffer bytes, Charset cs);
public String(ByteBuffer bytes, String csname);

I think these constructors make good sense. They avoid an extra copy to an intermediate byte[].

One issue (also mentioned by Stephen Colebourne) is whether we need the csname overload. Arguably it's not needed if we have the Charset overload. And the csname overload throws UnsupportedEncodingException, which is checked. But the csname overload is apparently faster, since the decoder can be cached, and it's unclear when this can be remedied for the Charset case....

I could go either way on this one.

**

I'd also suggest adding a CharBuffer constructor:

    public String(CharBuffer cbuf)

This would be semantically equivalent to

    public String(char[] value, int offset, int count)

except using the chars from the CharBuffer between the buffer's position and its limit.

**

Regarding the getBytes() overloads:

public int getBytes(byte[] dst, int offset, Charset cs);
public int getBytes(byte[] dst, int offset, String csname);
public int getBytes(ByteBuffer bytes, Charset cs);
public int getBytes(ByteBuffer bytes, Charset csn);

On 2/13/18, 12:41 AM, Alan Bateman wrote:
These four methods encode as many characters as possible into the destination byte[] or buffer but don't give any indication that the destination didn't have enough space to encode the entire string. I thus worry they could be a hazard and result in buggy code. If there is insufficient space then the user of the API doesn't know how many characters were encoded so it's not easy to substring and call getBytes again to encode the remaining characters. There is also the issue of how to size the destination. What would you think about having them fail when there is insufficient space? If they do fail then there is a side effect that they will have written to the destination so that would need to be documented too.

I share Alan's concern here.

If the intent is to reuse a byte[] or a ByteBuffer, then there needs to be an effective way to handle the case where the provided array/buffer doesn't have enough room to receive the decoded string. A variety of ways of dealing with this have been mentioned, such as throwing an exception; returning negative value to indicate failure, possibly also encoding the number of bytes written; or even allocating a fresh array or buffer of the proper size and returning that instead. The caller would have to check the return value and take care to handle all the cases properly. This is likely to be fairly error-prone.

This also raises the question in my mind of what these getBytes() methods are intended for.

On the one hand, they might be useful for the caller to manage its own memory allocation and reuse of arrays/buffers. If so, then it's necessary for intermediate results from partial processing to be handled properly. If the destination fills up, there needs to be a way to report how much of the input was consumed, so that a subsequent operation can pick up where the previous one left off. (This was one of David Lloyd's points.) If there's sufficient room in the destination, there needs to be a way to report this and how much space remains in the destination. One could contemplate adding all this information to the API. This eventually leads to

    CharsetEncoder.encode(CharBuffer in, ByteBuffer out, boolean endOfInput)

which has all the necessary partial progress state in the buffers.

On the other hand, maybe the intent of these APIs is for convenience. I'd observe that String already has this method:

    public byte[] getBytes(Charset)

which returns the decoded bytes in a newly allocated array of the proper size. This is pretty convenient. It doesn't let the caller reuse a destination array or buffer... but that's what brings in all the partial result edge cases.

Bottom line is that I'm not entirely sure of the use of these new getBytes() overloads. Maybe I've missed a use case where these work; if so, maybe somebody can describe it.

s'marks


Reply via email to