On Tue, Feb 13, 2018 at 2:41 AM, Alan Bateman <alan.bate...@oracle.com> wrote: > On 13/02/2018 06:24, Xueming Shen wrote: >> >> Hi, >> >> Please help review the proposal to add following constructors and methods >> in String >> class to take ByteBuffer as the input and output data buffer. >> >> public String(ByteBuffer bytes, Charset cs); >> public String(ByteBuffer bytes, String csname); > > These constructors looks good (for the parameter names then I assume you > meant "src" rather than "bytes" here). > >> public int getBytes(byte dst, int offset, Charset cs); >> public int getBytes(byte dst, int offset, String csname); >> public int getBytes(ByteBuffer bytes, Charset cs); >> public int getBytes(ByteBuffer bytes, Charset csn); > > These four methods encode as many characters as possible into the > destination byte or buffer but don't give any indication that the > destination didn't have enough space to encode the entire string. I thus > worry they could be a hazard and result in buggy code. If there is > insufficient space then the user of the API doesn't know how many characters > were encoded so it's not easy to substring and call getBytes again to encode > the remaining characters. There is also the issue of how to size the > destination. What would you think about having them fail when there is > insufficient space? If they do fail then there is a side effect that they > will have written to the destination so that would need to be documented > too.
The ones that output to a ByteBuffer have more flexibility in that the buffer position can be moved according to the number of bytes written, but the method _could_ return the number of _chars_ actually written. But this is not particularly useful without variants which accept an offset into the string, unless it can be shown that s.substring(coffs).getBytes(xxx) is reasonably efficient. It might be better to shuffle this around a little and instead have a Charset[Encoder].getBytes(int codePoint, byte b, int offs, int len)/.getBytes(int codePoint, ByteBuffer buf) kind of thing which returns the number of bytes or e.g. -1 or -count if there isn't enough space in the target. Then it would be less onerous for users to write simple for-each-codepoint loops which encode as far as is reasonable but no farther without too many error-handling gymnastics. -- - DML