Alexis Marrero wrote:
The next test that I will run this against will be with an obscene
amount of data for which this improvement helps a lot!
The dumb thing is the checking for boundaries.
I'm using http "chunked" encoding to access a raw TAPE device through
HTTP with python (it GETs or POSTs the raw data as body, each chunk
coresponds to a tape block). It blazes the data through at the max
network speed with hardly any CPU usage. This HTTP upload code uses 100%
CPU while running on my 3GHz box.
The looking-for-line-ends and mime boundaries method is very inefficient
compared to that. They oughta have put a "content-length" into every
chunk header, and we wouldn't have had this problem in the first place.
I think the only realistic way to improve performance is to read the
client input in binary chunks, and then looking for '\r\n---boundary'
strings in the chunk using standard string functions. Most of the CPU
time is now spent in the readline() call.
This also means revising all the mime body parsing to cope with that...
I doubt if that will be worth the effort for anyone.