Stephen's recent emails has made me thinking about something that has bugged
me for some time: long lines.

Specifically, email lines that are longer than 1000 characters in violation
of all of the relevent email standards.  I wish they didn't exist, but the
reality is that they do and we have to somehow deal with them.

I know we don't really deal with them in our message parser, but I want
to put that one aside for now and just focus on the network portion,
specifically in the netsec layer.  Most of netsec doesn't care, with
the exception of netsec_readline().

The reason that netsec_readline() exists is that a lot of things care about
"lines", mostly because on the wire we get "lines" that end with CR LF but
we write out Unix-formatted files which just end in LF.  So I made a
convenience function to just grab a "line" rather than duplicate a ton
of code.  I limited this to 64k, which I thought was way larger than
it could ever possibly need to be.  Turns out that's not large enough!

Possible solutions:

- Just suck it up and always allocate enough memory to deal with whatever
  line we encounter.  I am sure this will fall apart when someone receives
  a message with a several hundred megabyte line; I am sure such an email
  exists somewhere today.

- If we can't grab a whole line, send the too-short line upstream and let
  the upper layer deal.

I am wondering if there's a hybrid solution here.

Looking at the code, there's basically two places today where the
function netsec_getline() is used:

- In the SASL parsing code, which for most mechanisms have to pass a full
  buffer INTO the SASL functions.  I am not aware of any SASL mechanisms
  that send large tokens, but those might crop up eventually.

- The code which reads a message, line by line, and then writes out those
  lines to the local storage; this code just takes a line and then writes
  it out with a LF at the end.

It occurs to me that the semantics here in the above usages are very
different!  For the first case we're going to have to pass in a full
"line" into the SASL function, so there's no benefit in reading this
partially; if we get a 500 megabyte SASL token we're going to have to
pass the whole thing into the SASL library.  But in the second case,
if we only get a partial line then we could simply omit the trailing
LF and write out what we have.

This suggests to me that netsec_readline() could be modified in two
ways:

- Add a flag that says, "Always read a full line, no matter what, and
  allocate whatever memory is required to do so".

- Return a flag that indicates whether or not you got a full line.

In the former case, things would proceed as usual during SASL negotiation.
In the latter case, we could simply omit writing out the trailing LF
if we got an incomplete line.

Thoughts?  This would be the most complete solution, with the cost of
modifying some of the layers to the current POP code.

--Ken

Reply via email to