[ https://issues.apache.org/jira/browse/XERCESC-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Scott Cantor updated XERCESC-2250: ---------------------------------- Labels: CURL (was: ) > Curl NetAccessor mishandles larger data with NetAcc_InternalError > ----------------------------------------------------------------- > > Key: XERCESC-2250 > URL: https://issues.apache.org/jira/browse/XERCESC-2250 > Project: Xerces-C++ > Issue Type: Bug > Components: Utilities > Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, > 3.1.4, 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5 > Reporter: William S Fulton > Assignee: Scott Cantor > Priority: Major > Labels: CURL > Fix For: 3.3.0 > > > The Curl NetAccessor has a buffer overflow bug > It can be easily replicated if the curl NetAccessor is turned on during > configure: > {{./configure --with-icu --with-curl}} > and then invoking the NetAccessorTest executable using one of the large > files, I get: > {{~/xerces-c $ ./tests/NetAccessorTest file://$(pwd)/doc/program-dom.xml}} > {{Exception during test:}} > {{ internal error in NetAccessor}} > The problem is in CurlURLInputStream::writeCallback which returns a value > less than the expected value the function should consume as cnt != > totalConsume. According to > [https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html] CURL_WRITEFUNC_ERROR > will then be returned. CURL_WRITEFUNC_ERROR is not handled in the switch > statement in CurlURLInputStream::readMore, hence > XMLExcepts::NetAcc_InternalError is thrown. > > I can see that the logic error in CurlURLInputStream::writeCallback is down > to the assumption that libcurl will call this callback just once before > Xerces is able to clear the buffer in CurlURLInputStream::readBytes. Perhaps > this is because the Curl docs linked to above is not clear that up to > CURL_MAX_WRITE_SIZE bytes could be provided on each or multiple invocations > of the callback. Xerces makes the assumption that libcurl would only provide > CURL_MAX_WRITE_SIZE bytes for each call to curl_multi_info_read. This is not > correct on close inspection of the Xerces code at > [https://github.com/curl/curl/blob/160f0233590d0a02422594104ae805e1aa08d3db/lib/cw-out.c#L218] > where there is a loop that will call the callback multiple times. Each > invocation of the callback could expect up to CURL_MAX_WRITE_SIZE bytes to be > consumed. However, Xerces can only handle CURL_MAX_WRITE_SIZE in total for > multiple invocations of the callback due to the buffer definition: > XMLByte CurlURLInputStream::fBuffer[CURL_MAX_WRITE_SIZE]; > > Regarding solutions, one solution would be... > If the number of bytes to consume in CurlURLInputStream::writeCallback would > exceed the size of CurlURLInputStream::fBuffer then we could return > CURL_WRITEFUNC_PAUSE from the callback to defer consuming the bytes, see [ > https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html|https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html]. > However, unnecessarily pausing the network transfer seems inefficient and an > unnecessary performance hit, so probably best not to consider this. > > I think the better solution would be to replace the fixed size buffer > CurlURLInputStream::fBuffer with a dynamically sized buffer. Looking at the > Xerces code base, I don't see any kind of queue container and the STL does > not seem to be used. I was thinking of using the ValueVectorOf container with > template type B say, where B is a struct buffer with a member > XMLByte[CURL_MAX_SIZE_WRITE_SIZE]. A fixed container size of one would equate > to the current implementation. The fix would be to get it to grow if needed. > Probably just a max size of 2 or 3 would be used in reality. Thoughts? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org