Dear Xerces-C Development team, especially Scott,

Bhargava (in CC) and I are working on a fuzzer for xerces-c. The goal of
this project is to integrate xerces-c into oss-fuzz, for it to be
continuously fuzzed. A test harness
we set up already found a way reachable failing assertion in the
xerces-c xml parsing code. Find attached a patch that fixes that assertion.
The assertion fails when parsing a malformed xml-file, we attached a
crashing testcase. We would suggest fixing this assertion, since it
opens up the possibility
for Denial of Service attacks via malformed xml files.
We used the code below to parse file and fuzz xerces-c, where Data
contains the contents of the xml-file
and Size is the length of the file content. The parser fails with the
following assertion:
xercesc/internal/XMLReader.cpp:651: bool
xercesc_3_2::XMLReader::getName(xercesc_3_2::XMLBuffer &, const bool):
Assertion `fCharIndex+1 < fCharsAvail' failed.

We would also like to ask you for feedback on our fuzzing harness (code
below). The basic idea is that we want to stress-test xerces-c by
passing different "Data" strings.
Once you are happy with the fuzzing harness, we would send a PR your way
and work for the integration into OSS-Fuzz.

Cheers and thank you,
Bhargava and Vincent

    SAXParser::ValSchemes valScheme = SAXParser::Val_Auto;
    bool doNamespaces = false;
    bool doSchema = false;
    bool schemaFullChecking = false;
    SAXParser *parser = new SAXParser;
    parser->setValidationScheme(valScheme);
    parser->setDoNamespaces(doNamespaces);
    parser->setDoSchema(doSchema);
    parser->setHandleMultipleImports(true);
    parser->setValidationSchemaFullChecking(schemaFullChecking);
    static const char *gMemBufId = "prodInfo";
    MemBufInputSource *memBufIS = new MemBufInputSource(
        (const XMLByte *)Data, Size, gMemBufId, false);
    parser->parse(*memBufIS);
    delete parser;
    delete memBufIS;
On 10/31/19 1:38 PM, Cantor, Scott wrote:
> On 10/29/19, 11:51 AM, "Vincent Ulitzsch" <vincent.ulitz...@gmail.com> wrote:
>
>> We were wondering if an integration to oss-fuzz[1] would be interesting
>> for xerces-c?
> It falls into the category of "don't ask questions you don't want the answers 
> to". It will likely lead to the discovery of many vulnerabilities, some 
> possibly difficult or impossible to fix based on the remaining knowledge of 
> the code and the resources available.
>
>> This would allow parts of xerces' codebase to be continuously fuzzed, which
>> would probably result in the detection of security bugs early on in the
>> development process.
> There is no active development. There have been build process changes of 
> late, but nothing else really of note.
>
>> If you are interested, we would be happy to help with writing the fuzzers.
> s/help/do the work, then I'm happy to see it happen, but I have no knowledge 
> of, nor time to, work on such a thing.
>
> -- Scott
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
> For additional commands, e-mail: c-dev-h...@xerces.apache.org
>
<�
Index: src/xercesc/internal/XMLReader.cpp
===================================================================
--- src/xercesc/internal/XMLReader.cpp	(revision 1869376)
+++ src/xercesc/internal/XMLReader.cpp	(working copy)
@@ -648,7 +648,8 @@
         if ((fCharBuf[fCharIndex] >= 0xD800) && (fCharBuf[fCharIndex] <= 0xDB7F)) {
            // make sure one more char is in the buffer, the transcoder
            // should put only a complete surrogate pair into the buffer
-           assert(fCharIndex+1 < fCharsAvail);
+           if (fCharsAvail < fCharIndex + 1)
+               return false;
            if ((fCharBuf[fCharIndex+1] < 0xDC00) || (fCharBuf[fCharIndex+1] > 0xDFFF))
                return false;
 
@@ -677,7 +678,8 @@
             {
                 // make sure one more char is in the buffer, the transcoder
                 // should put only a complete surrogate pair into the buffer
-                assert(fCharIndex+1 < fCharsAvail);
+                if (fCharsAvail < fCharIndex+1)
+                    return false;
                 if ( (fCharBuf[fCharIndex+1] < 0xDC00) ||
                         (fCharBuf[fCharIndex+1] > 0xDFFF)  )
                     break;
@@ -723,7 +725,8 @@
     if ((fCharBuf[fCharIndex] >= 0xD800) && (fCharBuf[fCharIndex] <= 0xDB7F)) {
         // make sure one more char is in the buffer, the transcoder
         // should put only a complete surrogate pair into the buffer
-        assert(fCharIndex+1 < fCharsAvail);
+        if (fCharsAvail < fCharIndex+1)
+            return false;
         if ((fCharBuf[fCharIndex+1] < 0xDC00) || (fCharBuf[fCharIndex+1] > 0xDFFF))
             return false;
 

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org

Reply via email to