> Is there any way to transform Mega XML documents, larger than 2GB, through > XSL using Xalan-C++? To implement this by creating > XalanSourceTreeDocument or XercesDOMWrapperParsedSource, as demonstrated > in ParsedSourceWrapper.cpp sample code, will use up impractically large > amount of memory. Will Xalan-C++ 1.11 SAX2 feature be any help? Can a SAX > ContentHandler be linked with XalanTransformer? Or, is there some other > feature in Xalan-C++ (or other libraries) that can be used to process Mega > XML documents? > > Thanks, > > Ray Chu > Document Semantics, Product Engineering > +1 703.573.2883 > Twitter: > > @OpenText > > Website: > > www.opentext.com<http://www.opentext.com/> > > [http://www.opentext.com/file_source/OpenText/en_US/PNG/opentext_gxs-Feb4-250x35.png]<http://www.opentext.com/> > This email message is confidential, may be privileged, and is intended for > the exclusive use of the addressee. Any other person is strictly > prohibited from disclosing or reproducing it. If the addressee cannot be > reached or is unknown to you, please inform the sender by return email and > delete this email message and all copies immediately. > > There are some memory tradeoffs that can be used depending on your resource burden.
SAX2 requires less burden than DOM. Even with SAX2, the entire documents (parsed document, and compiled stylesheet) must be consumed into memory as operational data node trees. If your issue is buffer size for read/write operations, you can use some callback methods for incremental transfer. If your issue is the operating system heap, you can supply your own allocator with separate file backing. This will avoid unwelcome expansion stress on system paging files. Virtual memory usage is still a requirement. If your issue is with stylesheet recursions, you may need to partition your transformation activities into smaller chunks, and thus do multiple transforms on intermediate result sets. If your issue is with stack, you may need to tell the operating system to use a different boundary between stack, heap, and code sections. SHORT ANSWER: There is no easy single solution. Sincerely, Steven J. Hathaway