(Agh! Mozilla crashed taking my almost-completed reply with it. Here's another try.)
Jochen,
Jochen Wiedmann wrote:
Daniel Barclay wrote:...
XML Schema specification says (part 1, section 4.2.1):
applications are allowed, indeed encouraged, to avoid <include>ing the same schema document more than once to forestall the necessity of establishing identity component by component.
...
JaxMeXS is unable to "establish identity component by component". You might consider that as a lack of an important feature, but that's how it is. If you volunteer for adding that feature, you are welcome.
No, I don't think JaxMeXS needs to establish identity component by component, since the schema specification gives applications the option to skip already-included documents. However, I do think it needs to check for same-document inclusion just a bit differently than it currently does.
In any case, to be correct, the JaxMeXS needs to do _something_ to avoid reporting errors if they are not errors.
(Of course, whether they actually are errors depends on exactly what the schema specification really means, which isn't clear yet. I submitted a question to [EMAIL PROTECTED])
As long as that is missing, you've got to make sure, that the parser
knows, that it is including the "same schema document". Currently, this
is done by ensuring that the system ID is *the same*. Not the same as in
"the same file in the filesystem" or as in "referring to the same URL",
but lexically the same.
Why do you think the parser should compare the _unresolved_ URI reference from an include directive, which might be relative, instead of comparing the _resolved_, non-relative URI that it is about to use as a system ID to retrieve a document?
The overall parser (JaxMeXS and/or the underlying, lower-level parser) already has to keep track of the base URI of the current schema document being parsed, xml:base attributes, and the location of URI references relative to xml:base attributes, and also has to combine that information to resolve any relative references into non-relative URIs in order to read included or imported schema documents.
Since the parser already has to resolve relative references into non-relative URIs, shouldn't the parser be comparing resolved, non-relative URIs instead of unresolved URI references that might still be relative references?
Again, if you don't like this as it is, you are
welcome to volunteer for a better solution.
As I wrote, the solution is simply to use the resolved non-relative URI (from the resolution you _already_ perform) instead of using the unresolved URI reference directly from the include directive. Regarding version 0.3.1, I wrote:
In fact, I used your URI resolution in getInputSource() to get resolved, non-relative URIs and used those resolved URIs (and not the original URI reference given in the include directive) with the includedSchemas Set. That seems to work correctly.
I moved the call to getInputSource(...) up to before the code that checked and then set includedSchemas and used the resolved URI in the InputSource from getInputSource() (instead of the unresolved URI reference) when checking and setting includedSchema.
The current CVS version has been rearranged a bit since then, so it's harder to tell exactly where the changes would go.
You seem to have moved the check for whether a document has already been parsed to _after_ you have done lower-level parsing. Although that doesn't necessarily hurt anything, reading the document (even just partially) _before_ checking whether it has already been included seems strange. Why was that done?
Given that change, it's hard to be specific about the best place for a fix for the problem at hand.
However, the first thing is that method getInputSource(...) should probably be split so that your code to resolve a given URI reference against the base URI is separate from creating an InputSource for a given resolved URI.
Operations in method parse(XsESchema,String) on field parsedSchemas should use resolved URIs and not unresolved URI references. That probably means that either: - that method needs a baseURI parameter so that it can resolve any relative reference in the pSchemaLocation parameter into non-relative URI, or - callers need to do the resolution before calling that method - (or the recent code rearrangement needs to be partly undone).
I also think the check for whether to skip a document should occur _before_ you perform lower-level parsing of the document.
It doesn't make sense to resolve the URI (with getInputSource(...)), perform low-level parsing, and only then check the URI and ignore what was parsed. Shouldn't JaxMeXS resolve the URI and then check the URI and then skip all parsing?
Believe it or not, but what is currently in the code is sufficient to achieve what you want, if you are ready to add an EntityResolver and make sure, that *the lexically same* system ID's are given to the parser.
EntityResolver seems like the wrong tool for the job.
The purpose of an entity resolver is usually to map a requested URI reference to the _content_ of a document, usually by getting (via an InputStream or Reader), or at least pointing to (via a URI string), a cached copy from somewhere other than the specified location.
However, all that is needed here to map the requested URI reference to a non-relative URI.
Related to all this, where does XSLogicalParser handle xml:base attributes? (Or are they handled in a different class?)
Daniel
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
