RE: Xerces issues handling recursive schema includes

Alberto Massari Tue, 01 Nov 2005 00:22:15 -0800

Hi Elisha,

At 19.39 31/10/2005 -0800, Elisha Berns wrote:

Neil,


I made a naïve implementation of an EntityResolver that only uses the
absolute paths of the SystemIds it receives, but this doesn't work for
the following reasons:

The main schema file includes ~20 other schema files which are located
in other directories using relative paths and each one of those 20 files
includes ~20 files (which can be included multiple times) also using
relative paths.  So if I use XMLPlatformUtils::weavePaths() using the
base path from the main schema file being parsed with all of those
relative paths in the other included schema files, the results are
invalid paths.

The issue is that the EntityResolver needs to know what base path to use
when it gets a SystemId in order to correctly resolve it to an absolute
path. And the base path keeps changing as the SAX2XMLReader parses
through the paths it finds in schemaLocation attributes.

Is there any way to get this information (the correct base path to use
per relative path) without having to pre-parse all the schema files for
their schemaLocation attributes?  Surely there must be some simpler way
to prevent the parser from mistaking two or more relative SystemIds as
different SystemIds?

To overcome this limitation there is aXMLEntityResolver interface that you shouldregister usingSAX2XMLReaderImpl::setXMLEntityResolver (you mayhave to cast your SAX2XMLReader to theimplementation class). In yourXMLEntityResolver-derived class you shouldimplement resolveEntity(XMLResourceIdentifier*)resolving the entity using the getSystemId() and getBaseURI() accessors.


Hope this helps,
Alberto

Thanks,

Elisha

> Hi Elisha,
>
> Recursive, or circular, includes are supposed to be handled properly
by a
> schema parser.  While I'm not really active anymore on the code base,
this
> question does come up periodically, usually in the context of a set of
> schemas that get loaded purely via schemaLocation hints, or via a
user's
> EntityResolver which doesn't set system identifiers on the
InputSources it
> returns to the parser.  The usual way to get around this is to
register a
> custom EntityResolver instance, and take good care that system
identifier
> fields are always set to the same value when an InputSource is
returned.
> It's best if this is absolute, but I think a relative URI should work
too.
>  The reason this is important is that the parser uses system
identifiers
> internally to figure out whether it's processed a schema document
before.
>
> Cheers,
> Neil
> Neil Graham
> Manager, C++ Compiler Front-End and Runtime Development
> IBM Toronto Lab
> Phone:  905-413-3519, T/L 969-3519
> E-mail:  [EMAIL PROTECTED]
>
>
>
>
>
> "Elisha Berns" <[EMAIL PROTECTED]>
> 10/30/2005 10:30 PM
> Please respond to
> c-dev
>
>
> To
> "Xerces C++ Development" <[email protected]>
> cc
>
> Subject
> Xerces issues handling recursive schema includes
>
>
>
>
>
>
> Hi,
>
> I'm trying to determine both what Xerces does when it encounters
> recursive schema includes and what to do about it because it causes
some
> problems.
>
> It appears that the XercesC schema parser creates multiple XSxxx type
> objects for the same type if the schema files are included
recursively.
> In addition it would appear that the load time for a schema is much,
> much slower in the presence of recursive includes.
>
> I get one 'proper' globally defined type object but multiple
duplicates
> when the type appears as a contained type (in a complexType
definition).
> The only way I know this now is because I get different pointer values
> for the XSxxx object when this situation arises, even though they end
up
> pointing to the same type.
>
> Does anybody know firsthand whether there is any internal mechanism to
> prevent this from happening (apparently not), and what can be done, at
> present, to prevent this duplication from occuring.
>
> It has occurred to me that it might be a good idea to create a new
type
> of parser warning specifically regarding the issue of 'recursive
> includes'.  This of course only makes sense if there is a strong
> consensus that this is a classic anti-pattern of XML Schema
development
> and should be avoided at all costs.  I can see more or less how to
> implement it outside of Xerces by constructing a dependency graph of
the
> schema files and testing for back-edges.  So my question about this
side
> of things is whether there is any desire to make this test a built in
> part of the parser to make the parser smarter about these things?
>
> Thanks for some feedback here.
>
> Elisha Berns
> [EMAIL PROTECTED]
> tel. (310) 556 - 8332
> fax (310) 556 - 2839
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Xerces issues handling recursive schema includes

Reply via email to