#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
     Reporter:  David Smith      |                    Owner:  (none)
         Type:  Bug              |                   Status:  new
    Component:  HTTP handling    |                  Version:  dev
     Severity:  Release blocker  |               Resolution:
     Keywords:                   |             Triage Stage:  Accepted
    Has patch:  0                |      Needs documentation:  0
  Needs tests:  0                |  Patch needs improvement:  0
Easy pickings:  0                |                    UI/UX:  0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):

 So! I followed the suggestion from the Python Discourse post, where Barry
 Scott suggested to benchmark *bytes* header parsing instead of unicode
 since:

 > When I worked on code to parse HTTP headers we worked in bytes to save
 the cost of decoding to unicode.

 And I was surprised to see that the slowdown is actually a speed gain for
 complex headers. See these comparison tables:

 ||= Python 3.11 (real `cgi`, email.message gets `bytes`) =||= `cgi` =||=
 `email.message` =||= ratio =||
 || `text/plain`                                           || 0.316   ||
 1.618             || ~5.1x   ||
 || `text/html; charset=UTF-8; boundary=something`         || 1.362   ||
 4.364             || ~3.2x   ||
 || `application/x-stuff; title*=...; foo=bar; foo2*=...`  || 1.822   ||
 2.405             || ~1.3x   ||


 ||= Python 3.13 (`cgi` shim, email.message gets `bytes`) =||= `cgi` =||=
 `email.message` =||= ratio =||
 || `text/plain`                                           || 0.312   ||
 1.676             || ~5.4x   ||
 || `text/html; charset=UTF-8; boundary=something`         || 1.188   ||
 3.887             || ~3.3x   ||
 || `application/x-stuff; title*=...; foo=bar; foo2*=...`  || 4.358   ||
 2.175             || ~0.5x   ||

 So maybe we could indeed shortcircuit the parsing if no `;` is found in
 the header line (to avoid paying the 5x slowdown), but then is unclear to
 me how we could resort back to the header bytes when what we get in the
 WSGI `environ` is str.

 Carlton, Jake, David, any ideas?
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:9>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/0107019856b9a863-269fa07e-ea34-44ae-8ecf-45e07023e566-000000%40eu-central-1.amazonses.com.

Reply via email to