Stephen's recent emails has made me thinking about something that has bugged me for some time: long lines.
Specifically, email lines that are longer than 1000 characters in violation of all of the relevent email standards. I wish they didn't exist, but the reality is that they do and we have to somehow deal with them. I know we don't really deal with them in our message parser, but I want to put that one aside for now and just focus on the network portion, specifically in the netsec layer. Most of netsec doesn't care, with the exception of netsec_readline(). The reason that netsec_readline() exists is that a lot of things care about "lines", mostly because on the wire we get "lines" that end with CR LF but we write out Unix-formatted files which just end in LF. So I made a convenience function to just grab a "line" rather than duplicate a ton of code. I limited this to 64k, which I thought was way larger than it could ever possibly need to be. Turns out that's not large enough! Possible solutions: - Just suck it up and always allocate enough memory to deal with whatever line we encounter. I am sure this will fall apart when someone receives a message with a several hundred megabyte line; I am sure such an email exists somewhere today. - If we can't grab a whole line, send the too-short line upstream and let the upper layer deal. I am wondering if there's a hybrid solution here. Looking at the code, there's basically two places today where the function netsec_getline() is used: - In the SASL parsing code, which for most mechanisms have to pass a full buffer INTO the SASL functions. I am not aware of any SASL mechanisms that send large tokens, but those might crop up eventually. - The code which reads a message, line by line, and then writes out those lines to the local storage; this code just takes a line and then writes it out with a LF at the end. It occurs to me that the semantics here in the above usages are very different! For the first case we're going to have to pass in a full "line" into the SASL function, so there's no benefit in reading this partially; if we get a 500 megabyte SASL token we're going to have to pass the whole thing into the SASL library. But in the second case, if we only get a partial line then we could simply omit the trailing LF and write out what we have. This suggests to me that netsec_readline() could be modified in two ways: - Add a flag that says, "Always read a full line, no matter what, and allocate whatever memory is required to do so". - Return a flag that indicates whether or not you got a full line. In the former case, things would proceed as usual during SASL negotiation. In the latter case, we could simply omit writing out the trailing LF if we got an incomplete line. Thoughts? This would be the most complete solution, with the cost of modifying some of the layers to the current POP code. --Ken
