Re: [fpc-devel] Base64 decoding stream in the FCL

Bram Kuijvenhoven Mon, 12 Feb 2007 03:08:29 -0800

Hi Michael,

Thanks for your reply!

I have done a little more research; details are below.


Michael Van Canneyt wrote:

On Mon, 12 Feb 2007, Bram Kuijvenhoven wrote:

The Base64 decoding stream in the Base64 unit in the FCL appears to be broken.
In particular
- it does not handle whitespace (or other characters not from the bas64
alphabet), and


It was not designed to do this.


There is code trying to deal with bytes decoded as 99 (a sentinel value), but 
it is not really functional.

Looking more detailed at section 2.3 of RFC3548 
(http://www.ietf.org/rfc/rfc3548.txt), it is indeed stated that

  'Implementations MUST reject the encoding if it contains characters
  outside the base alphabet when interpreting base encoded data, unless
  the specification referring to this document explicitly states
  otherwise.  Such specifications may, as MIME does, instead state that
  characters outside the base encoding alphabet should simply be
  ignored when interpreting data ("be liberal in what you accept").
  Note that this means that any CRLF constitute "non alphabet
  characters" and are ignored.'

So the current design is not wrong (from the perspective of RFC3548 alone); 
only in the current implementation it not actually rejects these chars (e.g. it 
does not raise an exception).

RFC2045, about MIME, mentioned at a comment at the top in base64.pp, states 
however that these chars must be ignored instead of rejected. I recommend to 
support this 'mode' as well, as it allows line breaks, which is quite useful in 
textual environments.

- Read does not return the correct number of bytes read when at the end of the
stream, and
- the meaning of the EOF is a little bit unclear

These bugs can be circumvented by filtering out whitespace first and calling
Size [which works!] to determine the actual stream size, but this is of course
ugly, slower and not working on non-seekable streams.


The non-seekable streams should remain supported.


The point is that it currently only supports seekable streams (when calling 
GetSize). For Read, it doesn't need seeking of course, and this will remain so.

There is not only choice in ignoring/rejecting non-base alphabet characters, 
but also in dealing with '=' pad characters that are not at the end of the 
string. (Up to two of such '=' pad characters have to be used at the end to 
complete the last 4-byte sequence; this last sequence will encode  3 - nr. of 
'='s at the end  bytes instead of  3. (Normally, in Base64 encoding, sequences 
of 4 bytes are used to encode 3 bytes from the input.))

In particular, the current 'getsize' calculation of TBase64DecodeStream (NB 
this is done in the Seek method) seeks to the end of the input stream, reads 
the last two characters, and determines from whether these are '=' or '==' by 
how much the 'div 4 * 3' calculated stream size should be decreased. If we 
signal an EOF at the first occurence of a '=' (besides the end of the input 
stream), this calculation can go very wrong. The best behavior would be to 
raise an exception here I think (when following RFC3548.

If I understand it correctly, input strings should also always have a byte 
length that is a multiple of 4.

I propose to let TBase64DecodeStream have two 'modes':
- 'strict mode': follows RFC3548, rejects any characters outside of base64 
alphabet, only accepts up to two '=' characters at the end, and requires the 
input to have a Size being a multiple of 4; otherwise raises an 
EBase64DecodeException

- 'MIME mode': follows RFC2045, ignores any characters outside of base64 alphabet, takes any '=' as end of string, handles apparently truncated input streams gracefully

Also, I'd tend to make MIME mode the default, but I leave that up to you (core 
devs) to decide.

In code:

 TBase64DecodeMode = (bdmStrict, bdmMIME);
 ...
 TBase64DecodeStream = class(TSteam)
   ...
   property Mode:TBase64DecodeMode ...

Note that in MIME mode, GetSize will Read the entire stream, whereas Strict 
mode allows the calculation currently done in Seek.

If you test it, please try adding tests in fpcunit format.


What would be a good starting point to learn about how to use fpcunit? (I 
haven't used it before)

Regards,

Bram
_______________________________________________
fpc-devel maillist  -  [email protected]
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Base64 decoding stream in the FCL

Reply via email to