Re: Should CURLOPT_READFUNCTION need EOF twice?

Jiří Hruška via curl-library Wed, 15 Nov 2023 19:46:40 -0800

> I don't think this behavior was done on purpose and I think it would benefit
> users if we made it not do this.I think it should be sufficient for the
> callback to signal the end of the read *once*.


Thanks for the clarification. Actually I've found even other hints that this
was the case since sending the message, for example:

    tests/libtest/lib670.c:
        fprintf(stderr, "Read callback called after EOF\n");


> Fixing the code probably requires a better separation between end-of-input
> and end-of-upload than what that function does right now.

Yeah. And it should probably happen already in the inner Curl_fillreadbuffer(),
in order to be able to give a strong guarantee to users of CURLOPT_READFUNCTION
in general.

What do you think about a new BIT(fread_func_eof) in RequestState and
something like this?

    if(!data->req.fread_func_eof) {
      Curl_set_in_callback(data, true);
      nread = data->state.fread_func(data->req.upload_fromhere, 1,
                                     buffersize, data->state.in);
      Curl_set_in_callback(data, false);

      /* make sure the callback is not called again after EOF */
      data->req.fread_func_eof = (nread == 0);
    }
    else
      nread = 0;


> Writing up a test case to reproduce this scenario might take some creativity.
> Do you have any special setup that makes this happen more likely than others?

I ran the whole test suite with the code above and got no hits of the `else`
branch (with some extra code to silently create a marker file for me to see).


Then I found that it is this part of readwrite_upload() which limits the
exposure of this bug:

827:    if(0 != k->upload_present &&
828:       k->upload_present < curl_upload_refill_watermark(data) &&
829:       !k->upload_chunky &&/*(variable sized chunked header;
append not safe)*/
830:       !k->upload_done && /*!(k->upload_done once k->upload_present sent)*/
831:       !(k->writebytecount + k->upload_present - k->pendingheader ==
832:       data->state.infilesize)) {
833:      offset = k->upload_present;
834:    }

So it will happen only if:
#1: uploading something with an unknown size (lines 832-832)
#2: using HTTP/2 (because otherwise no size implies TE: chunked, line 829)
#3: the buffer contains between 1..(¹/₃₂ · upload_buffer_size) at the end of
    the upload, i.e. 3.125% chance for an arbitrary length (line 828)

With this knowledge, I can reproduce it at will:

$ cat tests/http/gen/apache/docs/data-10k | \
    CURL_SMALLSENDS=2000 ./src/curl --upload-file - --http2 \
    --insecure --url https://127.0.0.1:45779/curltest/put
...
readwrite_upload(): k->upload_present=0, offset=0
CURLOPT_READFUNCTION called, fillcount=10240
readwrite_upload(): k->upload_present=8240, offset=0
readwrite_upload(): k->upload_present=6240, offset=0
readwrite_upload(): k->upload_present=4240, offset=0
readwrite_upload(): k->upload_present=2240, offset=0
readwrite_upload(): k->upload_present=240, offset=240
CURLOPT_READFUNCTION called, fillcount=0
readwrite_upload(): k->upload_present=0, offset=0
CURLOPT_READFUNCTION called, fillcount=0
...

It has a piped `cat` and `--upload-file -` to address #1, `--http2` to address
#2, and the library must be compiled with CURLDEBUG to be able to do
CURL_SMALLSENDS=2000 to address #3. (UPLOADBUFFER_DEFAULT is 65536, divided by
32 gives 2048, so using 2000 makes sure that the condition always applies.)


Doing the same in tests is tricky, though. I got no hits for the `else` branch
in the whole test suite with CURL_SMALLSENDS=2000 either.

What tests/http/test_07_upload.py does comes the closest, but the tests upload
bigger data from files, so they do not satisfy the unknown upload size
condition #1.

When I wrote a new test that used `-T -` and a pipe instead, the source data
sizes like 100k or 10M could not hit #3 without CURL_SMALLSENDS either. That
seems to be the hardest condition to satisfy in general, because it really
depends on how the I/O behaves. If the network can take up to 64k bytes at
a time, getting the right amount of data to remain in the buffer at EOF is
very tricky.

Do you have any more creative ideas to try?
Or would a manual test with CURL_SMALLSEND be good enough?
-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Re: Should CURLOPT_READFUNCTION need EOF twice?

Reply via email to