William S Fulton created XERCESC-2250:
-----------------------------------------

             Summary: Curl NetAccessor overflow buffer resulting in 
NetAcc_InternalError
                 Key: XERCESC-2250
                 URL: https://issues.apache.org/jira/browse/XERCESC-2250
             Project: Xerces-C++
          Issue Type: Bug
          Components: Utilities
    Affects Versions: 3.2.5
            Reporter: William S Fulton


The Curl NetAccessor has a buffer overflow bug

It can be easily replicated if the curl NetAccessor is turned on during 
configure:

{{./configure --with-icu --with-curl}}

and then invoking the NetAccessorTest executable using one of the large files, 
I get:

{{~/xerces-c $ ./tests/NetAccessorTest file://$(pwd)/doc/program-dom.xml}}
{{Exception during test:}}
{{    internal error in NetAccessor}}

The problem is in CurlURLInputStream::writeCallback which returns a value less 
than the expected value the function should consume as cnt != totalConsume. 
According to [https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html] 
CURL_WRITEFUNC_ERROR will then be returned. CURL_WRITEFUNC_ERROR is not handled 
in the switch statement in CurlURLInputStream::readMore, hence 
XMLExcepts::NetAcc_InternalError is thrown.

 

I can see that the logic error in CurlURLInputStream::writeCallback is down to 
the assumption that libcurl will call this callback just once before Xerces is 
able to clear the buffer in CurlURLInputStream::readBytes. Perhaps this is 
because the Curl docs linked to above is not clear that up to 
CURL_MAX_WRITE_SIZE bytes could be provided on each or multiple invocations of 
the callback. Xerces makes the assumption that libcurl would only provide 
CURL_MAX_WRITE_SIZE bytes for each call to curl_multi_info_read. This is not 
correct on close inspection of the Xerces code at 
[https://github.com/curl/curl/blob/160f0233590d0a02422594104ae805e1aa08d3db/lib/cw-out.c#L218]
 where there is a loop that will call the callback multiple times. Each 
invocation of the callback could expect up to CURL_MAX_WRITE_SIZE bytes to be 
consumed. However, Xerces can only handle CURL_MAX_WRITE_SIZE in total for 
multiple invocations of the callback due to the buffer definition:

XMLByte CurlURLInputStream::fBuffer[CURL_MAX_WRITE_SIZE];

 

Regarding solutions, one solution would be...

If the number of bytes to consume in CurlURLInputStream::writeCallback would 
exceed the size of CurlURLInputStream::fBuffer then we could return 
CURL_WRITEFUNC_PAUSE from the callback to defer consuming the bytes, see [ 
https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html|https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html].

However, unnecessarily pausing the network transfer seems inefficient and an 
unnecessary performance hit, so probably best not to consider this.

 

I think the better solution  would be to replace the fixed size buffer 
CurlURLInputStream::fBuffer with a dynamically sized buffer. Looking at the 
Xerces code base, I don't see any kind of queue container and the STL does not 
seem to be used. I was thinking of using the ValueVectorOf container with 
template type B say, where B is a struct buffer with a member 
XMLByte[CURL_MAX_SIZE_WRITE_SIZE]. A fixed container size of one would equate 
to the current implementation. The fix would be to get it to grow if needed. 
Probably just a max size of 2 or 3 would be used in reality. Thoughts?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org

Reply via email to