jcoglan opened a new issue, #4870:
URL: https://github.com/apache/couchdb/issues/4870

   We (@neighboorhoodie, @janl, myself) have observed some incorrect state 
conditions in `ibrowse` that lead to
   incorrect parsing of some HTTP responses. This is most likely to occur when a
   connection has been interrupted or a response not fully consumed, and it can
   lead to an `ibrowse` process retaining headers from previous responses and
   therefore incorrectly parsing later ones. In particular it can lead to
   non-chunked responses being parsed as chunk ones, which leads to an error 
where
   
[`hexlist_to_integer`](https://github.com/cmullaparthi/ibrowse/blob/v4.4.2/src/ibrowse_http_client.erl#L2008-L2014)
   and
   
[`to_ascii`](https://github.com/cmullaparthi/ibrowse/blob/v4.4.2/src/ibrowse_http_client.erl#L2016-L2037)
   are called with bad input.
   
   ## Description
   
   We observed this when the following error showed up in customer logs:
   
       [error] 2023-11-20T00:14:06.509562Z nonode@nohost <0.3929.5131> --------
       gen_server <0.3929.5131> terminated with reason: no function clause 
matching ibrowse_http_client:to_ascii(114)(line:1794)
       at
            ibrowse_http_client:hexlist_to_integer/3(line:1790)
         <= ibrowse_http_client:parse_11_response/2(line:1218)
         <= ibrowse_http_client:handle_sock_data/2(line:320)
         <= gen_server:try_dispatch/4(line:616)
         <= gen_server:handle_msg/6(line:686)
         <= proc_lib:init_p_do_apply/3(line:247)
   
   `ibrowse_http_client:to_ascii` is being called with the integer `114`, which
   denotes the letter `r` in ASCII, which is not a valid hex digit, so the call
   fails.
   
   The `gen_server` error includes the last message received by the process:
   
       {tcp,#Port<0.17503260>,
           <<"HTTP/1.1 500 Internal Server Error\r\n
              Cache-Control: must-revalidate\r\n
              Content-Length: 65\r\n
              Content-Type: application/json\r\n
              Date: Mon, 20 Nov 2023 00:14:06 GMT\r\n
              Server: CouchDB/2.3.1 (Erlang OTP/20)\r\n
              X-Couch-Request-ID: 5cc908776b\r\n
              X-Couch-Stack-Hash: 3377714225\r\n
              X-CouchDB-Body-Time: 0\r\n
              \r\n
              
{\"error\":\"badmatch\",\"reason\":\"{'EXIT',noproc}\",\"ref\":3377714225}\n">>}
   
   The error stacktrace passes through a [clause of
   
`parse_11_response`](https://github.com/cmullaparthi/ibrowse/blob/v4.4.2/src/ibrowse_http_client.erl#L1392-L1417)
   that handles chunked encoding, but the above response message does not have a
   `Transfer-Encoding` header. This is being invoked because it's reacting to 
the
   `gen_server` _state_, which in this case included the following header list:
   
       [
         {"Content-Type","multipart/mixed; 
boundary=\"7f6cd44f44a5d757879834cf6e8530f7\""},
         {"Date","Mon, 20 Nov 2023 00:13:34 GMT"},
         {"Server","CouchDB/2.3.1 (Erlang OTP/20)"},
         {"Transfer-Encoding","chunked"}
       ]
   
   (The full state object is quite large, but includes some other values 
indicating
   chunked encoding. We can provide the full state with client info redacted if
   needed.)
   
   These headers do not match those of the incoming message. An HTTP client is
   expected to handle streamed data, and so needs to keep some state to record
   headers it's seen so far. However the incoming message is a complete 
response,
   not a chunk of a stream, and so it should be parsed from a clean starting 
state.
   
   Our client has reported that they're seeing a lot of partial/interrupted
   responses, and our best theory for why this is happening is that an
   `ibrowse_http_client` instance is partially processing a response, which is
   interrupted, and the process is then reused for another request without being
   cleaned up first. Two main scenarios seem likely causes for this:
   
   - A code path that retries requests: a client makes one request, whose 
response
     is partially received, and the client then retries the request using the 
same
     `ibrowse_http_client` process, leading to confused state. For example,
     
[`couch_replicator_api_wrap:open_doc_revs`](https://github.com/apache/couchdb/blob/3.3.2/src/couch_replicator/src/couch_replicator_api_wrap.erl#L279-L381)
     performs retries, and the above headers indicating a chunked multipart
     response are consistent with fetching a doc with `open_revs=all`. The
     `gen_server` state also indicates this is the kind of request being made.
   
   - Reuse of a client from the pool without resetting its state. The
     
[`release`](https://github.com/apache/couchdb/blob/3.3.2/src/couch_replicator/src/couch_replicator_connection.erl#L172-L184)
     handler in `couch_replicator_connection` demonitors the worker and sets its
     `mref` to `undefined`, but otherwise sends no signal to the worker to reset
     its state. This could lead to a client being reused when it still contains
     headers from a previous response.
   
   ## Steps to Reproduce
   
   This is tricky to reproduce mechanically because it requires getting 
responses interrupted in order to trigger the bad state. We have only observed 
its effects in customer logs and then diagnosed possible causes by code review.
   
   ## Expected Behaviour
   
   When reusing an `ibrowse` process either immediately or by releasing it back 
to the pool, its state should be reset so that it does not conflate the state 
from multiple responses. If this cannot be done then the connection should not 
be reused.
   
   ## Your Environment
   
   * CouchDB version used: observed in production on 2.3.1, but code review 
makes us believe this is also possible in 3.3.2
   * Browser name and version: N/A
   * Operating system and version: Ubuntu 20.04


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to