Sorry, I don't know how much memory is used. I just had a look at the maximum used memory in the task manager (Window XP). It doesn't matter if i used a DOMLSParserFilter or not the process DOMPrint.exe used the same size of memory. The XML-Elements DATA which i want to reject have very large values and i think if i reject these nodes they are also removed from memory. Does "be marked for recycling" mean, that these DATA nodes remain in memory?
Mirko -------- Original-Nachricht -------- > Datum: Mon, 07 Sep 2009 09:26:05 +0200 > Von: Alberto Massari <[email protected]> > An: [email protected] > Betreff: Re: method startElement() from class DOMLSParserFilter > Mirko Braun wrote: > > Hi Alberto, > > > > thank you very much for your help. I integrated the patch in > > 3.0.1 and it worked. There is no exception any more. > > But there is still one problem. The usage of memory is still > > of the same size. I think if a node is rejected from the tree > > the usage of memory should also decrease. Is my conclusion > > correct? > > > > Yes, if a node is rejected is should be marked for recycling; how much > memory are you seeing is been used? > > Alberto > > > Mirko > > > > -------- Original-Nachricht -------- > > > >> Datum: Fri, 04 Sep 2009 16:12:16 +0200 > >> Von: Alberto Massari <[email protected]> > >> An: [email protected] > >> Betreff: Re: method startElement() from class DOMLSParserFilter > >> > > > > > >> In effect I am seeing so many problems with that code that the only > >> suggestion I have is to get the latest 3.0 from the trunk and work with > >> what I have just committed (or get the patch from > >> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 > >> code). This version should support your original code. > >> > >> Alberto > >> > >> > >> Mirko Braun wrote: > >> > >>> Hi Alberto, > >>> > >>> yes, i'm still using the method startElement(). Is it better > >>> to use the method acceptNode() to reject the DATA node from > >>> the DOM or is there any other possibility? > >>> > >>> Mirko > >>> > >>> > >>> -------- Original-Nachricht -------- > >>> > >>> > >>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200 > >>>> Von: Alberto Massari <[email protected]> > >>>> An: [email protected] > >>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>> > >>>> > >>> > >>> > >>>> Hi Mirko, > >>>> are you still using startElement()? That API would mess with the > >>>> > >> current > >> > >>>> parent, so it would break the parsing at a certain point. > >>>> > >>>> Alberto > >>>> > >>>> Mirko Braun wrote: > >>>> > >>>> > >>>>> Hi Alberto, > >>>>> > >>>>> yes i'm sure that DATA is not a root node. I debugged a little bit. > >>>>> The exception occurs after the sixth time this DATA node was found. > >>>>> > >>>>> Mirko > >>>>> > >>>>> -------- Original-Nachricht -------- > >>>>> > >>>>> > >>>>> > >>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200 > >>>>>> Von: Alberto Massari <[email protected]> > >>>>>> An: [email protected] > >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Hi Mirko, > >>>>>> are you sure that your root node isn't one of those DATA elements? > In > >>>>>> this case the document node would see more than one root element. > >>>>>> > >>>>>> Alberto > >>>>>> > >>>>>> Mirko Braun wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Hi Alberto, > >>>>>>> > >>>>>>> thank you for you answer. I integrated the changes you > >>>>>>> suggested, but the result is still the same: > >>>>>>> > >>>>>>> DOM Error during parsing: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> DOMException code is: 3 > >>>>>>> Message is: attempt is made to insert a node where it is not > >>>>>>> > >> permitted > >> > >>>>>>> Best regards, > >>>>>>> Mirko > >>>>>>> > >>>>>>> -------- Original-Nachricht -------- > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 > >>>>>>>> Von: Alberto Massari <[email protected]> > >>>>>>>> An: [email protected] > >>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Hi Mirko, > >>>>>>>> I think the current implementation of the DOMLSParserFilter > doesn't > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> work > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> nicely with your code, as the rejected nodes are not recycled and > >>>>>>>> > >> the > >> > >>>>>>>> memory will grow to the same level as before. > >>>>>>>> Anyhow, you should instead override acceptNode like this: > >>>>>>>> > >>>>>>>> DOMParserFilter::FilterAction > >>>>>>>> > >> DOMParserFilter::acceptNode(DOMElement* > >> > >>>>>>>> node) > >>>>>>>> { > >>>>>>>> // for element whose name is "DATA", skip it > >>>>>>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && > >>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0) > >>>>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>>>> else > >>>>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>>>> } > >>>>>>>> > >>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to > >>>>>>>> origNode->release() after the call to removeChild(). > >>>>>>>> > >>>>>>>> Alberto > >>>>>>>> > >>>>>>>> > >>>>>>>> Mirko Braun wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Hello everybody, > >>>>>>>>> > >>>>>>>>> i would like to parse a quite large XML file (about 180 MB). > >>>>>>>>> I used the DOM interface because i need the tree for further > >>>>>>>>> processing of the data the xml file contains. Of course there > >>>>>>>>> is a lot of memory used during parsing the file and i got an > >>>>>>>>> "Out of memory" exception. > >>>>>>>>> > >>>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht > Xercesc > >>>>>>>>> > >>>>>>>>> > >>>> C++ > >>>> > >>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> parsing. > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> That is perfect for me because one XML-Element in my large file > >>>>>>>>> contains most of the data. This XML-Element is called DATA and > >>>>>>>>> appears serveral time in my XML file. > >>>>>>>>> So i had the idea to reject this XML-Element from the DOM tree > >>>>>>>>> during parsing to reduce the used memory by using the method > >>>>>>>>> startElement() of the DOMLSParserFilter class. After that i > would > >>>>>>>>> use a SAX parser and just get all XML-Elements DATA with their > >>>>>>>>> > >>>>>>>>> > >>>> values. > >>>> > >>>> > >>>>>>>>> But it does not work. > >>>>>>>>> I integregated my code into the DOMPrint example which comes > along > >>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred: > >>>>>>>>> > >>>>>>>>> DOM Error during parsing: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> DOMException code is: 3 > >>>>>>>>> Message is: attempt is made to insert a node where it is not > >>>>>>>>> > >>>>>>>>> > >>>> permitted > >>>> > >>>> > >>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter > >>>>>>>>> > >> class > >> > >>>>>>>>> and its method startElement? > >>>>>>>>> It is possible to realize my idea with the help of this class? > Did > >>>>>>>>> i something wrong with in my code (please have a look below)? > >>>>>>>>> > >>>>>>>>> I would be very grateful for any help. > >>>>>>>>> > >>>>>>>>> Thanks in advanced, > >>>>>>>>> Mirko > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> DOMPrintFilter.hpp: > >>>>>>>>> -------------------- > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> class DOMParserFilter : public DOMLSParserFilter { > >>>>>>>>> public: > >>>>>>>>> > >>>>>>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> DOMNodeFilter::SHOW_ALL); > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> ~DOMParserFilter(){}; > >>>>>>>>> > >>>>>>>>> virtual FilterAction startElement(DOMElement* node); > >>>>>>>>> virtual FilterAction acceptNode(DOMNode* node){return > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> DOMParserFilter::FILTER_ACCEPT;}; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const > {return > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> fWhatToShow;}; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> private: > >>>>>>>>> DOMNodeFilter::ShowType fWhatToShow; > >>>>>>>>> }; > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> DOMPrintFilter.cpp: > >>>>>>>>> -------------------- > >>>>>>>>> > >>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType > >>>>>>>>> > >> whatToShow) > >> > >>>>>>>>> :fWhatToShow(whatToShow) > >>>>>>>>> {} > >>>>>>>>> > >>>>>>>>> DOMParserFilter::FilterAction > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> DOMParserFilter::startElement(DOMElement* > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> node) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> { > >>>>>>>>> // for element whose name is "DATA", skip it > >>>>>>>>> if (XMLString::compareString(node->getNodeName(), > >>>>>>>>> > >>>>>>>>> > >>>> element_data)==0) > >>>> > >>>> > >>>>>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>>>>> else > >>>>>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> DOMPrint.cpp: > >>>>>>>>> --------------- > >>>>>>>>> > >>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, > >>>>>>>>> > >> xercesc::chLatin_S, > >> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> xercesc::chNull }; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> xercesc::DOMImplementation *implParser = > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> xercesc::DOMLSParser* parser = > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >> > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, > 0); > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> errReporter); > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > >>>>>>>>> parser->setFilter(pDOMParserFilter); > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> // > >>>>>>>>> // Parse the XML file, catching any XML exceptions that > might > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> propogate > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> // out of it. > >>>>>>>>> // > >>>>>>>>> bool errorsOccured = false; > >>>>>>>>> DOMDocument *doc = NULL; > >>>>>>>>> > >>>>>>>>> try > >>>>>>>>> { > >>>>>>>>> doc = parser->parseURI(gXmlFile); > >>>>>>>>> } > >>>>>>>>> catch (const OutOfMemoryException&) > >>>>>>>>> { > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> XERCES_STD_QUALIFIER endl; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> errorsOccured = true; > >>>>>>>>> } > >>>>>>>>> catch (const XMLException& e) > >>>>>>>>> { > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> parsing\n > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> Message: " > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER > endl; > >>>>>>>>> errorsOccured = true; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> catch (const DOMException& e) > >>>>>>>>> { > >>>>>>>>> const unsigned int maxChars = 2047; > >>>>>>>>> XMLCh errText[maxChars + 1]; > >>>>>>>>> > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: > '" > >>>>>>>>> > >>>>>>>>> > >>>> << > >>>> > >>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> gXmlFile << "'\n" > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> << "DOMException code is: " << e.code << > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> XERCES_STD_QUALIFIER endl; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, > errText, > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> maxChars)) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> StrX(errText) > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> << XERCES_STD_QUALIFIER endl; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> errorsOccured = true; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> catch (...) > >>>>>>>>> { > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> parsing\n > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> " << XERCES_STD_QUALIFIER endl; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> errorsOccured = true; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > >
