Hi Philipp,
the bug you are seeing is
https://issues.apache.org/jira/browse/XERCESC-1858. You can get the
latest version of TransService.cpp from
http://svn.apache.org/viewvc/xerces/c/branches/xerces-3.0/src/xercesc/util/TransService.cpp?view=co
FYI, the bug is triggered by any URL that gets the "/" path from a server.
Alberto
Philipp Machke wrote:
Hi,
I'm using Xerces-C++ 3.0.1 for development with VisualStudio 2005 on Windows. I
have implemented an importer for rss feeds using the URLInputSource for a
SAXParser.
The importer runs fine on most urls, but when I test it on a more complex
website it crashes with a heap corruption error (call stack and error message
attached).
I tried several different urls, some caused errors, others did not. I attached
a small list of urls for each group.
Looking at the call stack, I'm quite sure that Xerces causes the error. My
current guess is, that Xerces may have some problems with javascript or
something like that. If so, it would be good to have an exception to catch.
Currently I can only hope, that the user doesn't type in one of the 'bad urls',
since I can do nothing to prevent that crash. As you can imagine, I'm not
really comfortable with this solution. I would greatly appreciate any hints to
how I can handle this situation.
While I'm at it: Why is it, that NetAccessorException is not documented in the
api-docu? It took me some time to find out that there even is such a exception.
Here are all information I collected up until now:
---------------
Debug Output
-----------------------------------------
Debug Error!
HEAP CORRUPTION DETECTED: after Normal block (#22425) at 0x017F8858.
CRT detected that the application wrote to memory after end of heap buffer.
ShapeImport_d.exe!_free_dbg_nolock(void * pUserData=0x017f8858, int
nBlockUse=1) Line 1333 + 0x3b bytes C++
ShapeImport_d.exe!_free_dbg(void * pUserData=0x017f8858, int
nBlockUse=1) Line 1220 + 0xd bytes C++
ShapeImport_d.exe!operator delete(void * pUserData=0x017f8858) Line 54
+ 0x10 bytes C++
ShapeImport_d.exe!xercesc_3_0::MemoryManagerImpl::deallocate() + 0x16
bytes
ShapeImport_d.exe!xercesc_3_0::TranscodeToStr::~TranscodeToStr() +
0x2a bytes
ShapeImport_d.exe!xercesc_3_0::BinHTTPInputStreamCommon::createHTTPRequest() +
0x4a6 bytes
ShapeImport_d.exe!xercesc_3_0::BinHTTPInputStreamCommon::sendRequest()
+ 0x66 bytes
ShapeImport_d.exe!xercesc_3_0::BinHTTPURLInputStream::BinHTTPURLInputStream()
+ 0x2e7 bytes
ShapeImport_d.exe!xercesc_3_0::WinSockNetAccessor::makeNew() + 0x7a
bytes
ShapeImport_d.exe!xercesc_3_0::XMLURL::makeNewStream() + 0x359 bytes
ShapeImport_d.exe!xercesc_3_0::URLInputSource::makeStream() + 0x12
bytes
ShapeImport_d.exe!xercesc_3_0::ReaderMgr::createReader() + 0x3c bytes
ShapeImport_d.exe!xercesc_3_0::IGXMLScanner::scanReset() + 0x5de bytes
ShapeImport_d.exe!xercesc_3_0::IGXMLScanner::scanDocument() + 0x82
bytes
ShapeImport_d.exe!xercesc_3_0::SAX2XMLReaderImpl::parse() + 0xb3 bytes
ShapeImport_d.exe!GeoDataImport::RssImport::startImport(std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char>
>,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >
fileURLs=[1]("http://maps.oberbayern.de")) Line 67 + 0x11 bytes C++
ShapeImport_d.exe!main(int argc=1, char * * argv=0x003a3ac0) Line 129
+ 0x22 bytes C++
ShapeImport_d.exe!__tmainCRTStartup() Line 327 + 0x19 bytes C
ShapeImport_d.exe!mainCRTStartup() Line 196 C
kernel32.dll!7c816fe7()
[Frames below may be incorrect and/or missing, no symbols loaded for
kernel32.dll]
--------------
Release Output
---------------------------------------
HEAP[ShapeImport.exe]: Heap block at 0100CFC8 modified at 0100CFD2 past
requested size of 2
Windows has triggered a breakpoint in ShapeImport.exe.
This may be due to a corruption of the heap, and indicates a bug in
ShapeImport.exe or any of the DLLs it has loaded.
The output window may have more diagnostic information
HEAP[ShapeImport.exe]: Invalid Address specified to RtlFreeHeap( 00DE0000,
0100CFD0 )
Windows has triggered a breakpoint in ShapeImport.exe.
This may be due to a corruption of the heap, and indicates a bug in
ShapeImport.exe or any of the DLLs it has loaded.
The output window may have more diagnostic information
--------------
Source Code (without error handling, simplified)
------------------------------------------------------
using namespace xercesc;
MyImporter::MyImporter()
{
XMLPlatformUtils::Initialize();
}
MyImporter::startImport(vector<string> fileURLs)
{
SAX2XMLReader* parser = XMLReaderFactory::createXMLReader();
parser->setFeature(XMLUni::fgSAX2CoreValidation, true);
parser->setFeature(XMLUni::fgSAX2CoreNameSpaces, true);
MyHandler* handler = new MyHandler();
parser->setContentHandler(handler);
parser->setErrorHandler(handler);
BOOST_FOREACH(string fileURL, fileURLs)
{
XMLURL url = XMLURL(fileURL.c_str());
URLInputSource* urlSource = new URLInputSource(url);
parser->parse(*urlSource); //here I catch SAXParseException,
NetAccessorException, XMLException, my own exceptions and std::runtime_error
}
}
---------------
URLs
------------------------------------------------------------
no error:
http://maps.oberbayern.de/RSS.ashx?Thema=Events&MaxCount=100
http://xerces.apache.org/index.html
http://xerces.apache.org/xerces-c/mailing-lists.html
http://scheduleworld.com/sw2/index.html
error:
http://maps.oberbayern.de
http://maps.google.com/
http://www.youtube.com/
Thanks in advance for any help!
Cheers,
Philipp Maschke
PS: I just tested some more and found that my catch clauses do not kick in... I
caused a NetAccessorException and it wasnt caught, although it should have been
:(
Here's the code:
try {
parser->parse(*urlSource);
}
catch (const SAXParseException& toCatch) {
char* message = XMLString::transcode(toCatch.getMessage());
cout << "SAX parser exception:" << endl << message << endl;
XMLString::release(&message);
}
catch (const NetAccessorException& toCatch)
{
char* message = XMLString::transcode(toCatch.getMessage());
cout << "Network exception: " << endl << message << endl;
XMLString::release(&message);
}
...
When I cause a NetAccessorException I get as debug output:
First-chance exception at 0x7c812a6b in ShapeImport_d.exe: Microsoft C++
exception: xercesc_3_0::NetAccessorException at memory location 0x0012f404..
but on stdout just:
Fatal Error: Unable to open file '...' at line: 0
What am I doing wrong?