Various Codec methods need to encode and decode bytes/Strings. Not all byte sequences can be decoded into Strings, and not all Strings can be encoded into bytes.
So a decision has to be made as to what to do when an invalid sequence is detected. At present the encoding/decoding is done by the String class The Javadoc for methods that use a Charset say: "This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement" (byte array or String depending on direction) However the Javadoc for methods that specify the charset name as a String say: "The behavior of this method when this string cannot be encoded in the given charset is unspecified" It looks as though the "unspecified" behaviour is to replace invalid sequences, but this cannot be guaranteed across all JVMs. That can easily be fixed by ensuring that the code only ever uses the methods that take a Charset. However it's not obvious that replacement is the correct policy. See for example: CODEC-228 URLCodec.decode does not throw DecoderException with invalid UTF-8 It seems to me it would be better to report errors. At present, the result of a round-trip encode/decode sequence may not result in the original input. That seems wrong for Codec, which IMO should be able to accurately encode and decode its input. At present conversions may be silently 'adjusted'. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org