On Thu, 06 Feb 2014 06:19:16 -0800, Fernandes, Nivaldo <[email protected]> wrote:
> This is a very useful summary. I must admit that I have been somewhat shy > about using schema validation in MarkLogic ever since I came across this: > http://developer.marklogic.com/pipermail/general/2012-October/011576.html > > In summary, attributes were inadvertently being added to our data on > ingestion...but perhaps this has changed. If you have schemas in scope, it is true that we will look at those schemas when parsing a document in its namespace, and we will recognize and apply default attributes (we will also determine whitespace handling rules and normalization, FWIW). This is regardless of whether you explicitly validate or not. We will always do basic type assessment. We do now give you complete control over whether you serialize those default attributes. There was an issue with the handling of defaulted attributes when their parent node got copied, but we now make the defaulted nature of the attributes "sticky" so they will still think they are defaulted in the new context. Ellis summarized things well. Schema validation for simple single-document schemas where each schema has a distinct namespace is pretty simple. Just avoid putting in schemaLocation attributes in your source documents and rely on the namespace-based resolution. If you have large multi-document schemas it is slightly more complex: you can add a namespace=>schema document binding at the group level so the server can tell what the root document of your schema is, and away you go. Where people have real trouble is when they want to have schemas that are changing or where they want different schemas for the same namespace. Now you have to work hard to make sure the right schema is being applied. It is possible, but you have to be very careful with your schema locations. It gets just about impossible to have multiple (conflicting) schemas in scope in the same XQuery at the same time. Clarifying issue 19722: the schema cache will refresh if you update your schemas, but documents that had a schema applied that are still in the expanded tree cache still have their old applied types and still point to the old applied schema. Our general recommendation is to treat schemas as static configuration objects, and try hard to live with the one-namespace/one-schema paradigm if at all possible. As I said, it is possible to have alternative schemas for the same namespace, but you have to be very careful about it in all your application code and data and data layout. Some people do need that complexity, but it usually takes some detailed hand-holding to walk them through it. //Mary > > From: Ellis Pritchard <[email protected]<mailto:[email protected]>> > Reply-To: MarkLogic Developer Discussion > <[email protected]<mailto:[email protected]>> > Date: Thursday, February 6, 2014 4:05 AM > To: MarkLogic Developer Discussion > <[email protected]<mailto:[email protected]>> > Subject: Re: [MarkLogic Dev General] Validation against schema issue > > Hi Lanz, > > Schema validation is probably a neglected feature for most devs using > MarkLogic, and unlike most of the rest of ML, there are several 'gotchas' > (and even a defect: 19722!) which can make working with schema's a bit of a > pain: > > 1/ A schema split over several files having the same namespace will need > Group configuration to point to the root document for the namespace, else ML > will pick up a random document from the set and you may get an unexpected > type error. > 2/ By default, databases share the Schemas database; this is generally a bad > idea, and you should probably set a separate schema database for each content > database. > 3/ If you are using no-namespace schemas, you are very vulnerable to the > types conflicting with each other, especially if sharing schema databases. > 4/ Due to Bug #19722, ML doesn't automatically pick up changes to schemas, > even worse, it can mean that it gets confused about them when they are > re-loaded. > > However, if you've got a decently typed schema, it sure saves a lot of > casting, and makes data integrity easier to maintain, especially with a > pre-commit validation trigger as suggested by Geert. > > Ellis. > > On 15 Jan 2014, at 09:56, Jakob Fix > <[email protected]<mailto:[email protected]>> wrote: > > > hi, > > thanks for this. a couple of follow-up questions: > > - will there be support for xml schema 1.1 at some stage? > > - i have the impression that is very few talk about validation of documents > on this list. is that because people don't validate? or because it's so easy > that it's not worth mentioning? i'd be interested in patterns related to > validation people are using. validation outside of the database? what about > validation when a document is updated in the database, how do you assure the > document is still valid? xdmp:validate, schema validation? other options? > > On Jan 14, 2014 7:28 PM, "Mary Holstege" > <[email protected]<mailto:[email protected]>> wrote: > > I think the problem here is you are using XSD 1.1 and relying on one of > its features. MarkLogic currently doesn't support XSD 1.1. > > Technically we ought to not even attempt the validation when you have > an xs:all extended by an xs:all, but in general MarkLogic doesn't do a > great job > of schema checking in that way; mostly just assuming the schemas are OK. > > //Mary > > > On Tue, 14 Jan 2014 09:43:44 -0800, Lanz > <[email protected]<mailto:[email protected]>> > wrote: > >> Hi all, >> >> Here is the context : >> we use Marklogic 7.0-1. >> we have a schema database containing ours schemas, this db is referenced >> in >> our doc db as the schema db. >> These schemas (version 1.1) defined a base type and 2 extension types >> (ie : >> a basic publication as a base type and a 'summary' and an 'indicator' as >> extension types). The extensions types have their own elements in >> addition >> of the ones from the basic type. Some elements could be optional or >> mandatory, they are 'unordered' (using xs:all). All these schemas use the >> same namespace. >> Because the root element is the same for the 2 extension type ('work') we >> set the attribute 'schemalocation' in the 'work' root element to be sure >> ML >> uses the right schema during the validation. >> The documents have been validated against its schema in Oxygen without >> issue >> >> >> Here is the issue! >> When we try to validate a document before inserting it in Marklogic with >> xdmp:validate using neither strict", "lax", or "type" (with its own >> type), >> it failed. >> The error message mentions the right schema but does not take in account >> the optional elements. >> >> Please find the mentioned (simplified) schema, XML sample and error >> message >> here : https://gist.github.com/anonymous/8422411 >> >> >> Any help is welcome, many thanks >> Lanz > > > -- > Using Opera's revolutionary email client: http://www.opera.com/mail/ > _______________________________________________ > General mailing list > [email protected]<mailto:[email protected]> > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected]<mailto:[email protected]> > http://developer.marklogic.com/mailman/listinfo/general > > -- Using Opera's revolutionary email client: http://www.opera.com/mail/ _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
