Re: recursive includes hang JaxMeXS

Daniel Barclay Fri, 29 Apr 2005 14:01:42 -0700


(Agh!   Mozilla crashed taking my almost-completed reply with it.
Here's another try.)


Jochen,

Jochen Wiedmann wrote:

Daniel Barclay wrote:

...

XML Schema specification says (part 1, section 4.2.1):

   applications are allowed,
   indeed encouraged, to avoid <include>ing the same schema document more
   than once to forestall the necessity of establishing identity component
   by component.

...

JaxMeXS is unable to "establish identity component by component". You
might consider that as a lack of an important feature, but that's how it
is. If you volunteer for adding that feature, you are welcome.


No, I don't think JaxMeXS needs to establish identity component by
component, since the schema specification gives applications the option
to skip already-included documents.  However, I do think it needs to
check for same-document inclusion just a bit differently than it
currently does.

In any case, to be correct, the JaxMeXS needs to do _something_ to avoid
reporting errors if they are not errors.

(Of course, whether they actually are errors depends on exactly what
the schema specification really means, which isn't clear yet. I
submitted a question to [EMAIL PROTECTED])

As long as that is missing, you've got to make sure, that the parser knows, that it is including the "same schema document". Currently, this is done by ensuring that the system ID is *the same*. Not the same as in "the same file in the filesystem" or as in "referring to the same URL", but lexically the same.


Why do you think the parser should compare the _unresolved_ URI
reference from an include directive, which might be relative, instead
of comparing the _resolved_, non-relative URI that it is about to use
as a system ID to retrieve a document?


The overall parser (JaxMeXS and/or the underlying, lower-level parser)
already has to keep track of the base URI of the current schema
document being parsed, xml:base attributes, and the location of URI
references relative to xml:base attributes, and also has to combine
that information to resolve any relative references into non-relative
URIs in order to read included or imported schema documents.

Since the parser already has to resolve relative references into
non-relative URIs, shouldn't the parser be comparing resolved,
non-relative URIs instead of unresolved URI references that might still
be relative references?

Again, if you don't like this as it is, you are welcome to volunteer for a better solution.


As I wrote, the solution is simply to use the resolved non-relative
URI (from the resolution you _already_ perform) instead of using the
unresolved URI reference directly from the include directive.
Regarding version 0.3.1, I wrote:

  In fact, I used your URI resolution in getInputSource() to get
  resolved, non-relative URIs and used those resolved URIs (and not
  the original URI reference given in the include directive) with the
  includedSchemas Set.  That seems to work correctly.

I moved the call to getInputSource(...) up to before the code that
checked and then set includedSchemas and used the resolved URI in
the InputSource from getInputSource() (instead of the unresolved
URI reference) when checking and setting includedSchema.


The current CVS version has been rearranged a bit since then, so it's
harder to tell exactly where the changes would go.

You seem to have moved the check for whether a document has already
been parsed to _after_ you have done lower-level parsing.  Although
that doesn't necessarily hurt anything, reading the document (even
just partially) _before_ checking whether it has already been included
seems strange.  Why was that done?

Given that change, it's hard to be specific about the best place for
a fix for the problem at hand.

However, the first thing is that method getInputSource(...) should
probably be split so that your code to resolve a given URI reference
against the base URI is separate from creating an InputSource for a
given resolved URI.

Operations in method parse(XsESchema,String) on field parsedSchemas
should use resolved URIs and not unresolved URI references.  That
probably means that either:
- that method needs a baseURI parameter so that it can resolve any
  relative reference in the pSchemaLocation parameter into non-relative
  URI, or
- callers need to do the resolution before calling that method
- (or the recent code rearrangement needs to be partly undone).


I also think the check for whether to skip a document should occur
_before_ you perform lower-level parsing of the document.

It doesn't make sense to resolve the URI (with getInputSource(...)),
perform low-level parsing, and only then check the URI and ignore what
was parsed.  Shouldn't JaxMeXS resolve the URI and then check the URI
and then skip all parsing?

Believe it or not,  but what is currently in the code is sufficient to
achieve what you want, if you are ready to add an EntityResolver and
make sure, that *the lexically same* system ID's are given to the parser.


EntityResolver seems like the wrong tool for the job.

The purpose of an entity resolver is usually to map a requested URI
reference to the _content_ of a document, usually by getting (via an
InputStream or Reader), or at least pointing to (via a URI string),
a cached copy from somewhere other than the specified location.

However, all that is needed here to map the requested URI reference
to a non-relative URI.


Related to all this, where does XSLogicalParser handle xml:base
attributes?  (Or are they handled in a different class?)


Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: recursive includes hang JaxMeXS

Reply via email to