On Jan 2, 2006, at 3:41 PM, Justin Erenkrantz wrote:

+static apr_status_t process_request_line(request_rec *r, char *line,
+                                         int skip_first)
+{
+    if (!skip_first && (r->the_request == NULL)) {
+        /* This is the first line of the request */
+        if ((line == NULL) || (*line == '\0')) {
+ /* We skip empty lines because browsers have to tack a CRLF on to the end + * of POSTs to support old CERN webservers.
+             */
+            int max_blank_lines = r->server->limit_req_fields;
+            if (max_blank_lines <= 0) {
+                max_blank_lines = DEFAULT_LIMIT_REQUEST_FIELDS;
+            }
+            r->num_blank_lines++;
+            if (r->num_blank_lines < max_blank_lines) {
+                return APR_SUCCESS;
+            }
+        }
+        set_the_request(r, line);
+    }

This will cause a segfault if line is null and we are at or above max_blank_lines. Perhaps you meant for an else clause here?

Yes, thanks for catching that.


+    else {
+ /* We've already read the first line of the request. This is either + * a header field or the blank line terminating the header
+         */
+        if ((line == NULL) || (*line == '\0')) {
+            if (r->pending_header_line != NULL) {
+                apr_status_t rv = set_mime_header(r,
r->pending_header_line); +                if (rv != APR_SUCCESS) {
+                    return rv;
+                }
+                r->pending_header_line = NULL;
+            }
+            if (r->status == HTTP_REQUEST_TIME_OUT) {
+                apr_table_compress(r->headers_in,
APR_OVERLAP_TABLES_MERGE);
+                r->status = HTTP_OK;

Say what?  ...looks at rest of file...

Is this because r->status is unset and we're saying that's it's 'okay' to proceed with the request. If so, this *really* needs a comment to that effect. It makes no sense whatsoever otherwise. (We should probably remove the hack to set it to HTTP_REQUEST_TIME_OUT initially as part of these changes.)

Yeah, setting r->status to HTTP_OK is done here solely to make it work
with the existing logic about HTTP_REQUEST_TIME_OUT meaning "still
reading the request header."

+1 for of removing the HTTP_REQUEST_TIME_OUT hack.  I was trying
to be conservative by preserving that part of the original logic, but now that you mention it, we might as well replace it with something less subtle
in 2.3.  This will break any 3rd party modules that depend upon the
HTTP_REQUEST_TIME_OUT convention, but for a major release
like 2.4 or 3.0 that's a defensible choice.

+            }
+        }
+        else {
+            if ((*line == ' ') || (*line == '\t')) {
+ /* This line is a continuation of the previous one */
+                if (r->pending_header_line == NULL) {
+                    r->pending_header_line = line;
+                    r->pending_header_size = 0;
+                }
+                else {
+                    apr_size_t pending_len =
strlen(r->pending_header_line);
+                    apr_size_t fold_len = strlen(line);

This seems really expensive. You shouldn't need to recount pending_len each time.

Good point; I'll add something to keep track of the length.

+            }
+            break;

If I understand your direction, it'd bail out here if it ever got EAGAIN?

Yes indeed.  That's what the version on the async-read-dev branch does.

+request_rec *ap_read_request(conn_rec *conn)
+{
+    request_rec *r;
+    apr_status_t rv;
+
+    r = init_request(conn);
+
+    rv = read_partial_request(r);
+    /* TODO poll if EAGAIN */
+    if (r->status != HTTP_OK) {
+        ap_send_error_response(r, 0);
+        ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
+        ap_run_log_transaction(r);
+        return NULL;
+    }

Obviously, this can't be the case if you want to do real polling. This would be the wrong place to poll. You have to exit out of ap_read_request and go back up to an 'event thread' that waits until there's data to read on any incoming sockets. Then, you'd have to call ap_read_request again to 'restart' the parser whenever there is more data to read.

In my opinion, this code isn't even close to being async.

I'm relieved to hear that it's not async, since you're looking at the blocking version. :-) See ap_read_async_request() (still on the async-read- dev branch).

So, I wonder why it was even merged to trunk right now. You'd have to deal with partial lines and the state the header parser is in when it gets the next chunk of data - which this code doesn't even try to do. The current code is just going to bail when it doesn't have a 'complete' line.

My hunch is that you plan to build up pending_header_line and delay parsing until you have the line terminators; but that's not going to work really well with non-blocking sockets as you have to store the data you just read non-blocking'ly if you actually read multiple MIME header lines or, icky, even part of the request body or the next request. This just lends more credence to the argument that request_rec is the wrong place for the buffer. It's clearly connection or connection-filter oriented...

The async impl addesses this problem by only grabbing a line at a time from the input filter chain. The filters are responsible for buffering anything that's been read from the socket but not consumed by the current request. If you have a chance to review the async code (the version on the branch), I'm eager to get
some feedback on that part of the design.

(IIRC, another point that'll kill httpd is that httpd mis-uses EAGAIN; serf has a definition that EAGAIN means some data is read *and* you need to ask for more. httpd's filters don't have that. One more issue that httpd got wrong...)

To me, it sounds like we need a way for the MPM (or whatever) to redefine the accept logic instead of relying upon the core. Oh, yah, we do that already (heh) by allowing them to call something other than ap_process_connection: such as with ap_process_http_async_connection(). So, why are we even touching ap_read_request() at all? This is going to necessitate an ap_async_read_request() anyway...

Agreed on the need for async_read_request(); I purposely created that as a separate function, following the pattern set by ap_process_http_async_connection(), so that things depending on the semantics of the original ap_read_request() wouldn't break. The bulk of the refactoring that I've recently checked into the trunk is designed to allow the sync and async versions of ap_read_request() to share code. Given that the mime-folding code has been a source of security vulnerabilities in the past, I'm trying to stick with a single impl of the header parsing
code that the sync and async code can both use.

Brian

Reply via email to