[ 
https://issues.apache.org/jira/browse/XERCESC-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874303#comment-17874303
 ] 

Scott Cantor commented on XERCESC-2250:
---------------------------------------

One question I would have is how big is "big"?

I have similar code handling pretty big files, but that's relative, if big 
means multiple gigabytes. But that test file is tiny, and I definitely have 
similar code handling many megabytes of data without this issue arising.

I'd just like to know a bit more here before we draw any conclusions.

Perhaps it depends on the curl version or platform in some sense I guess, since 
it really depends on what curl decides to call the function with. Could even be 
TLS back-end dependent, so documenting exactly what curl build demonstrates the 
issue could also help.

As for a fix, if there's a serious bug, and a very non-invasive fix is to pause 
the transfer, that's a lot more attractive than a fundamental change requiring 
a lot of testing. Correctness >>> efficiency.

> Curl NetAccessor overflow buffer resulting in NetAcc_InternalError
> ------------------------------------------------------------------
>
>                 Key: XERCESC-2250
>                 URL: https://issues.apache.org/jira/browse/XERCESC-2250
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 3.2.5
>            Reporter: William S Fulton
>            Priority: Major
>
> The Curl NetAccessor has a buffer overflow bug
> It can be easily replicated if the curl NetAccessor is turned on during 
> configure:
> {{./configure --with-icu --with-curl}}
> and then invoking the NetAccessorTest executable using one of the large 
> files, I get:
> {{~/xerces-c $ ./tests/NetAccessorTest file://$(pwd)/doc/program-dom.xml}}
> {{Exception during test:}}
> {{    internal error in NetAccessor}}
> The problem is in CurlURLInputStream::writeCallback which returns a value 
> less than the expected value the function should consume as cnt != 
> totalConsume. According to 
> [https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html] CURL_WRITEFUNC_ERROR 
> will then be returned. CURL_WRITEFUNC_ERROR is not handled in the switch 
> statement in CurlURLInputStream::readMore, hence 
> XMLExcepts::NetAcc_InternalError is thrown.
>  
> I can see that the logic error in CurlURLInputStream::writeCallback is down 
> to the assumption that libcurl will call this callback just once before 
> Xerces is able to clear the buffer in CurlURLInputStream::readBytes. Perhaps 
> this is because the Curl docs linked to above is not clear that up to 
> CURL_MAX_WRITE_SIZE bytes could be provided on each or multiple invocations 
> of the callback. Xerces makes the assumption that libcurl would only provide 
> CURL_MAX_WRITE_SIZE bytes for each call to curl_multi_info_read. This is not 
> correct on close inspection of the Xerces code at 
> [https://github.com/curl/curl/blob/160f0233590d0a02422594104ae805e1aa08d3db/lib/cw-out.c#L218]
>  where there is a loop that will call the callback multiple times. Each 
> invocation of the callback could expect up to CURL_MAX_WRITE_SIZE bytes to be 
> consumed. However, Xerces can only handle CURL_MAX_WRITE_SIZE in total for 
> multiple invocations of the callback due to the buffer definition:
> XMLByte CurlURLInputStream::fBuffer[CURL_MAX_WRITE_SIZE];
>  
> Regarding solutions, one solution would be...
> If the number of bytes to consume in CurlURLInputStream::writeCallback would 
> exceed the size of CurlURLInputStream::fBuffer then we could return 
> CURL_WRITEFUNC_PAUSE from the callback to defer consuming the bytes, see [ 
> https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html|https://curl.se/libcurl/c/CURLOPT_WRITEFUNCTION.html].
> However, unnecessarily pausing the network transfer seems inefficient and an 
> unnecessary performance hit, so probably best not to consider this.
>  
> I think the better solution  would be to replace the fixed size buffer 
> CurlURLInputStream::fBuffer with a dynamically sized buffer. Looking at the 
> Xerces code base, I don't see any kind of queue container and the STL does 
> not seem to be used. I was thinking of using the ValueVectorOf container with 
> template type B say, where B is a struct buffer with a member 
> XMLByte[CURL_MAX_SIZE_WRITE_SIZE]. A fixed container size of one would equate 
> to the current implementation. The fix would be to get it to grow if needed. 
> Probably just a max size of 2 or 3 would be used in reality. Thoughts?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org

Reply via email to