[ https://issues.apache.org/jira/browse/XERCESC-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874338#comment-17874338 ]
Scott Cantor edited comment on XERCESC-2250 at 8/16/24 5:48 PM: ---------------------------------------------------------------- The mystery of my code is explained by the fact that my fork of this particular bit of logic was modified long ago to do a buffer realloc if it gets more data than expected, which prevents it from ever returning the lower count that triggers the unhandled curl result. I could therefore propose that we simply change the code to adopt my fix, which is going to be less invasive than chaining buffers, but that's my opinion. It would require a minor version bump because it's an ABI change altering the size of a class. It's not one-for-one but the commit in question is here: https://git.shibboleth.net/view/?p=cpp-xmltooling.git;a=commitdiff;h=56001cbc1073a07f70dedb7776929fb86f06d6a4 My version doesn't limit the size of the reallocation, because my use case didn't require that, it was merely a theoretical issue and I left a TODO. I'm willing to apply my fix and throw in a hard cap that would result in the same throw as before, which I don't believe (per the above) would be a security risk. The security risk I think comes into play if we tried to handle the curl code by returning a count less than the total data consumed. I'm not sure that buffer chaining would ultimately be better, other than it allocates in smaller chunks, as in the end it's still allocating more memory, but I don't know for sure. I'm just not able to do that work, but a patch to do that instead is fine also. The key in terms of assessment is whether I'm right that this isn't at present a security concern. was (Author: canto...@osu.edu): The mystery of my code is explained by the fact that my fork of this particular bit of logic was modified long ago to do a buffer realloc if it gets more data than expected, which prevents it from every returning the lower count that triggers the unhandled curl result. I could therefore propose that we simply change the code to adopt my fix, which is going to be less invasive than chaining buffers, but that's my opinion. It would require a minor version bump because it's an ABI change altering the size of a class. It's not one-for-one but the commit in question is here: https://git.shibboleth.net/view/?p=cpp-xmltooling.git;a=commitdiff;h=56001cbc1073a07f70dedb7776929fb86f06d6a4 My version doesn't limit the size of the reallocation, because my use case didn't require that, it was merely a theoretical issue and I left a TODO. I'm willing to apply my fix and throw in a hard cap that would result in the same throw as before, which I don't believe (per the above) would be a security risk. The security risk I think comes into play if we tried to handle the curl code by returning a count less than the total data consumed. I'm not sure that buffer chaining would ultimately be better, other than it allocates in smaller chunks, as in the end it's still allocating more memory, but I don't know for sure. I'm just not able to do that work, but a patch to do that instead is fine also. The key in terms of assessment is whether I'm right that this isn't at present a security concern. > Curl NetAccessor overflow buffer resulting in NetAcc_InternalError > ------------------------------------------------------------------ > > Key: XERCESC-2250 > URL: https://issues.apache.org/jira/browse/XERCESC-2250 > Project: Xerces-C++ > Issue Type: Bug > Components: Utilities > Affects Versions: 3.2.5 > Reporter: William S Fulton > Priority: Major > > The Curl NetAccessor has a buffer overflow bug > It can be easily replicated if the curl NetAccessor is turned on during > configure: > {{./configure --with-icu --with-curl}} > and then invoking the NetAccessorTest executable using one of the large > files, I get: > {{~/xerces-c $ ./tests/NetAccessorTest file://$(pwd)/doc/program-dom.xml}} > {{Exception during test:}} > {{ internal error in NetAccessor}} > The problem is in CurlURLInputStream::writeCallback which returns a value > less than the expected value the function should consume as cnt != > totalConsume. According to > [https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html] CURL_WRITEFUNC_ERROR > will then be returned. CURL_WRITEFUNC_ERROR is not handled in the switch > statement in CurlURLInputStream::readMore, hence > XMLExcepts::NetAcc_InternalError is thrown. > > I can see that the logic error in CurlURLInputStream::writeCallback is down > to the assumption that libcurl will call this callback just once before > Xerces is able to clear the buffer in CurlURLInputStream::readBytes. Perhaps > this is because the Curl docs linked to above is not clear that up to > CURL_MAX_WRITE_SIZE bytes could be provided on each or multiple invocations > of the callback. Xerces makes the assumption that libcurl would only provide > CURL_MAX_WRITE_SIZE bytes for each call to curl_multi_info_read. This is not > correct on close inspection of the Xerces code at > [https://github.com/curl/curl/blob/160f0233590d0a02422594104ae805e1aa08d3db/lib/cw-out.c#L218] > where there is a loop that will call the callback multiple times. Each > invocation of the callback could expect up to CURL_MAX_WRITE_SIZE bytes to be > consumed. However, Xerces can only handle CURL_MAX_WRITE_SIZE in total for > multiple invocations of the callback due to the buffer definition: > XMLByte CurlURLInputStream::fBuffer[CURL_MAX_WRITE_SIZE]; > > Regarding solutions, one solution would be... > If the number of bytes to consume in CurlURLInputStream::writeCallback would > exceed the size of CurlURLInputStream::fBuffer then we could return > CURL_WRITEFUNC_PAUSE from the callback to defer consuming the bytes, see [ > https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html|https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html]. > However, unnecessarily pausing the network transfer seems inefficient and an > unnecessary performance hit, so probably best not to consider this. > > I think the better solution would be to replace the fixed size buffer > CurlURLInputStream::fBuffer with a dynamically sized buffer. Looking at the > Xerces code base, I don't see any kind of queue container and the STL does > not seem to be used. I was thinking of using the ValueVectorOf container with > template type B say, where B is a struct buffer with a member > XMLByte[CURL_MAX_SIZE_WRITE_SIZE]. A fixed container size of one would equate > to the current implementation. The fix would be to get it to grow if needed. > Probably just a max size of 2 or 3 would be used in reality. Thoughts? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org