Hi Philipp,
the bug you are seeing is https://issues.apache.org/jira/browse/XERCESC-1858. You can get the latest version of TransService.cpp from http://svn.apache.org/viewvc/xerces/c/branches/xerces-3.0/src/xercesc/util/TransService.cpp?view=co

FYI, the bug is triggered by any URL that gets the "/" path from a server.

Alberto

Philipp Machke wrote:
Hi,

I'm using Xerces-C++ 3.0.1 for development with VisualStudio 2005 on Windows. I 
have implemented an importer for rss feeds using the URLInputSource for a 
SAXParser.
The importer runs fine on most urls, but when I test it on a more complex 
website it crashes with a heap corruption error (call stack and error message 
attached).
I tried several different urls, some caused errors, others did not. I attached 
a small list of urls for each group.
Looking at the call stack, I'm quite sure that Xerces causes the error. My 
current guess is, that Xerces may have some problems with javascript or 
something like that. If so, it would be good to have an exception to catch.
Currently I can only hope, that the user doesn't type in one of the 'bad urls', 
since I can do nothing to prevent that crash. As you can imagine, I'm not 
really comfortable with this solution. I would greatly appreciate any hints to 
how I can handle this situation.

While I'm at it: Why is it, that NetAccessorException is not documented in the 
api-docu? It took me some time to find out that there even is such a exception.

Here are all information I collected up until now:
---------------
Debug Output
-----------------------------------------
Debug Error!

HEAP CORRUPTION DETECTED: after Normal block (#22425) at 0x017F8858.
CRT detected that the application wrote to memory after end of heap buffer.

        ShapeImport_d.exe!_free_dbg_nolock(void * pUserData=0x017f8858, int 
nBlockUse=1)  Line 1333 + 0x3b bytes        C++
        ShapeImport_d.exe!_free_dbg(void * pUserData=0x017f8858, int 
nBlockUse=1)  Line 1220 + 0xd bytes        C++
        ShapeImport_d.exe!operator delete(void * pUserData=0x017f8858)  Line 54 
+ 0x10 bytes    C++
        ShapeImport_d.exe!xercesc_3_0::MemoryManagerImpl::deallocate()  + 0x16 
bytes    
        ShapeImport_d.exe!xercesc_3_0::TranscodeToStr::~TranscodeToStr()  + 
0x2a bytes  
        
ShapeImport_d.exe!xercesc_3_0::BinHTTPInputStreamCommon::createHTTPRequest()  + 
0x4a6 bytes     
        ShapeImport_d.exe!xercesc_3_0::BinHTTPInputStreamCommon::sendRequest()  
+ 0x66 bytes    
        
ShapeImport_d.exe!xercesc_3_0::BinHTTPURLInputStream::BinHTTPURLInputStream()  
+ 0x2e7 bytes    
        ShapeImport_d.exe!xercesc_3_0::WinSockNetAccessor::makeNew()  + 0x7a 
bytes      
        ShapeImport_d.exe!xercesc_3_0::XMLURL::makeNewStream()  + 0x359 bytes   
        ShapeImport_d.exe!xercesc_3_0::URLInputSource::makeStream()  + 0x12 
bytes       
        ShapeImport_d.exe!xercesc_3_0::ReaderMgr::createReader()  + 0x3c bytes  
        ShapeImport_d.exe!xercesc_3_0::IGXMLScanner::scanReset()  + 0x5de bytes 
        ShapeImport_d.exe!xercesc_3_0::IGXMLScanner::scanDocument()  + 0x82 
bytes       
        ShapeImport_d.exe!xercesc_3_0::SAX2XMLReaderImpl::parse()  + 0xb3 bytes 
        
ShapeImport_d.exe!GeoDataImport::RssImport::startImport(std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char>
 >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > 
fileURLs=[1]("http://maps.oberbayern.de";))  Line 67 + 0x11 bytes     C++
        ShapeImport_d.exe!main(int argc=1, char * * argv=0x003a3ac0)  Line 129 
+ 0x22 bytes     C++
        ShapeImport_d.exe!__tmainCRTStartup()  Line 327 + 0x19 bytes    C
        ShapeImport_d.exe!mainCRTStartup()  Line 196    C
        kernel32.dll!7c816fe7()         
        [Frames below may be incorrect and/or missing, no symbols loaded for 
kernel32.dll]      


--------------
Release Output
---------------------------------------
HEAP[ShapeImport.exe]: Heap block at 0100CFC8 modified at 0100CFD2 past 
requested size of 2
Windows has triggered a breakpoint in ShapeImport.exe.

This may be due to a corruption of the heap, and indicates a bug in 
ShapeImport.exe or any of the DLLs it has loaded.

The output window may have more diagnostic information
HEAP[ShapeImport.exe]: Invalid Address specified to RtlFreeHeap( 00DE0000, 
0100CFD0 )
Windows has triggered a breakpoint in ShapeImport.exe.

This may be due to a corruption of the heap, and indicates a bug in 
ShapeImport.exe or any of the DLLs it has loaded.

The output window may have more diagnostic information


--------------
Source Code (without error handling, simplified)
------------------------------------------------------
using namespace xercesc;

MyImporter::MyImporter()
{
        XMLPlatformUtils::Initialize();
}

MyImporter::startImport(vector<string> fileURLs)
{
        SAX2XMLReader* parser = XMLReaderFactory::createXMLReader();
        parser->setFeature(XMLUni::fgSAX2CoreValidation, true);
        parser->setFeature(XMLUni::fgSAX2CoreNameSpaces, true);

        MyHandler* handler = new MyHandler();
        parser->setContentHandler(handler);
        parser->setErrorHandler(handler);

        BOOST_FOREACH(string fileURL, fileURLs)
        {
                XMLURL url = XMLURL(fileURL.c_str());
                URLInputSource* urlSource = new URLInputSource(url);

                parser->parse(*urlSource); //here I catch  SAXParseException, 
NetAccessorException, XMLException, my own exceptions and std::runtime_error
        }
}

---------------
URLs
------------------------------------------------------------
no error:
http://maps.oberbayern.de/RSS.ashx?Thema=Events&MaxCount=100
http://xerces.apache.org/index.html
http://xerces.apache.org/xerces-c/mailing-lists.html
http://scheduleworld.com/sw2/index.html

error:
http://maps.oberbayern.de
http://maps.google.com/
http://www.youtube.com/


Thanks in advance for any help!

Cheers,
Philipp Maschke


PS: I just tested some more and found that my catch clauses do not kick in... I 
caused a NetAccessorException and it wasnt caught, although it should have been 
:(
    Here's the code:
    try {
                parser->parse(*urlSource);
        }
        catch (const SAXParseException& toCatch) {
                char* message = XMLString::transcode(toCatch.getMessage());
                cout << "SAX parser exception:" << endl << message << endl;
                XMLString::release(&message);
        }
        catch (const NetAccessorException& toCatch)
        {
                char* message = XMLString::transcode(toCatch.getMessage());
                cout << "Network exception: " << endl << message << endl;
                XMLString::release(&message);
        }
        ...

     When I cause a NetAccessorException I get as debug output:
     First-chance exception at 0x7c812a6b in ShapeImport_d.exe: Microsoft C++ 
exception: xercesc_3_0::NetAccessorException at memory location 0x0012f404..
     but on stdout just:
     Fatal Error: Unable to open file '...' at line: 0
What am I doing wrong?


Reply via email to