> Is there any way to transform Mega XML documents, larger than 2GB, through
> XSL using Xalan-C++?  To implement this by creating
> XalanSourceTreeDocument or XercesDOMWrapperParsedSource, as demonstrated
> in ParsedSourceWrapper.cpp sample code,  will use up impractically large
> amount of memory. Will Xalan-C++ 1.11 SAX2 feature be any help? Can a SAX
> ContentHandler be linked with XalanTransformer? Or, is there some other
> feature in Xalan-C++ (or other libraries) that can be used to process Mega
> XML documents?
>
> Thanks,
>
> Ray Chu
> Document Semantics, Product Engineering
> +1 703.573.2883
> Twitter:
>
> @OpenText
>
> Website:
>
> www.opentext.com<http://www.opentext.com/>
>
> [http://www.opentext.com/file_source/OpenText/en_US/PNG/opentext_gxs-Feb4-250x35.png]<http://www.opentext.com/>
> This email message is confidential, may be privileged, and is intended for
> the exclusive use of the addressee. Any other person is strictly
> prohibited from disclosing or reproducing it. If the addressee cannot be
> reached or is unknown to you, please inform the sender by return email and
> delete this email message and all copies immediately.
>
>
There are some memory tradeoffs that can be used depending on
your resource burden.

SAX2 requires less burden than DOM.  Even with SAX2, the entire
documents (parsed document, and compiled stylesheet) must be
consumed into memory as operational data node trees.

If your issue is buffer size for read/write operations, you can
use some callback methods for incremental transfer.

If your issue is the operating system heap, you can supply your
own allocator with separate file backing.  This will avoid unwelcome
expansion stress on system paging files.  Virtual memory usage
is still a requirement.

If your issue is with stylesheet recursions, you may need to
partition your transformation activities into smaller chunks,
and thus do multiple transforms on intermediate result sets.

If your issue is with stack, you may need to tell the operating
system to use a different boundary between stack, heap, and code
sections.

SHORT ANSWER: There is no easy single solution.

Sincerely,
Steven J. Hathaway


Reply via email to