aherbert commented on issue #46:
URL: https://github.com/apache/commons-codec/pull/46#issuecomment-617406675


   My aim was not to dishearten, it was to try and maintain a common interface 
and functionality in the different instances of BaseNCodec.
   
   The 72 character line wrap was an example. This is an option specified in 
the constructor for Base32/64. The default for URL safe encoding in Base64 is 
76 chars. The default is 0 for no line wrapping.
   
   I agree that whitespace support is not required. But it is also not 
unexpected given how Base32 and Base64 function. The use of a decode table to 
translate the bytes will have a similar runtime to your use of Hex. Each 
character must be checked to see it is valid. At the business end Hex does this 
on each character:
   
   ```java
   final int digit = Character.digit(ch, 16);
   if (digit == -1) {
       throw new DecoderException("Illegal hexadecimal character " + ch + " at 
index " + index);
   }
   return digit;
   ```
   
   I am suggesting we skip conversion of input byte[] to char[] and the use of 
Hex. Just do the decoding direct on the byte[] using a decode table. 
   
   Regarding line wrapping it is an expected feature of Base64 encoded data. As 
as you say there are standard formats used where this is done. The support has 
been carried through to Base32. I do not know if this has precedence or it was 
done because it could be. I was suggesting doing the same for Base16. If you 
encode data and write it somewhere it may be nice to line wrap this for 
convenience when viewing it. Supporting a line wrap when encoding is thus for 
human readable consumption. I find it more likely to use line wrapping on the 
encoded output than to use a character set that is anything other than UTF-8 
(or more specifically US_ASCII).
   
   But what about decoding? You have to support it if you are expecting the 
input to be line wrapped. But perhaps you are not. So what to do when the input 
is not a valid hex character during decoding is a separate issue. There is a 
case for adding support to BaseNCodec to throw an exception when the input byte 
is not a valid character in the alphabet. Currently they are ignored to allow 
whitespace padding. But this does of course allow you to put in junk and this 
is also ignored.
   
   Then we have 3 cases for a non-alphabet character during decoding:
   
   - Ignore
   - Detect if whitespace/padding and ignore; otherwise throw an exception
   - Throw an exception
   
   Currently Base16 would throw an exception. If you passed the exact same data 
to Base32/64 they would decode it as it is a valid alphabet (although they may 
throw an exception if using strict decoding depending on the trailing 
characters). This difference of behaviour is odd. The simplest option for 
Base16 is to do what Base32/64 do and ignore the invalid character.
   
   Cleaning up exception functionality should be part of a separate fix. To me 
it seems out of place to allow whitespace when decoding Base32/64 but not 
Base16. In addition you can encode to Base32/64 with whitespace but not with 
Base16.
   
   I would suggest fixing Base16 to work as per Base32/64 in this PR. Then 
fixing BaseNCodec to have an option flag to throw an exception when decoding if 
an invalid character is encountered. This functionality would be off by default 
for backward compatibility in Base32/64. The flag could be carried through to 
the BaseNCodecInput/OutputStream allowing you to optionally throw exceptions 
during streaming decoding when invalid characters are present.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to