Re: [MarkLogic Dev General] Validation against schema issue

Mary Holstege Thu, 06 Feb 2014 06:55:56 -0800

On Thu, 06 Feb 2014 06:19:16 -0800, Fernandes, Nivaldo <[email protected]> 
wrote:


> This is a very useful summary. I must admit that I have been somewhat shy 
> about using schema validation in MarkLogic ever since I came across this:
> http://developer.marklogic.com/pipermail/general/2012-October/011576.html
>
> In summary, attributes were inadvertently being added to our data on 
> ingestion...but perhaps this has changed.

If you have schemas in scope, it is true that we will look
at those schemas when parsing a document in its namespace,
and we will recognize and apply default attributes (we will
also determine whitespace handling rules and normalization,
FWIW). This is regardless of whether you explicitly validate
or not. We will always do basic type assessment.

We do now give you complete control over whether you serialize
those default attributes. There was an issue with the handling
of defaulted attributes when their parent node got copied,
but we now make the defaulted nature of the attributes "sticky"
so they will still think they are defaulted in the new context.

Ellis summarized things well.

Schema validation for simple single-document schemas where each
schema has a distinct namespace is pretty simple. Just avoid putting
in schemaLocation attributes in your source documents and rely on
the namespace-based resolution. If you have large multi-document
schemas it is slightly more complex: you can add a namespace=>schema
document binding at the group level so the server can tell what
the root document of your schema is, and away you go.

Where people have real trouble is when they want to have schemas
that are changing or where they want different schemas for the
same namespace. Now you have to work hard to make sure the right
schema is being applied. It is possible, but you have to be very
careful with your schema locations. It gets just about impossible to have
multiple (conflicting) schemas in scope in the same XQuery at the
same time.

Clarifying issue 19722: the schema cache will refresh if you
update your schemas, but documents that had a schema applied
that are still in the expanded tree cache still have their
old applied types and still point to the old applied schema.

Our general recommendation is to treat schemas as static configuration
objects, and try hard to live with the one-namespace/one-schema
paradigm if at all possible. As I said, it is possible to have
alternative schemas for the same namespace, but you have to be
very careful about it in all your application code and data and
data layout.  Some people do need that complexity, but it usually
takes some detailed hand-holding to walk them through it.

//Mary

>
> From: Ellis Pritchard <[email protected]<mailto:[email protected]>>
> Reply-To: MarkLogic Developer Discussion 
> <[email protected]<mailto:[email protected]>>
> Date: Thursday, February 6, 2014 4:05 AM
> To: MarkLogic Developer Discussion 
> <[email protected]<mailto:[email protected]>>
> Subject: Re: [MarkLogic Dev General] Validation against schema issue
>
> Hi Lanz,
>
> Schema validation is probably a neglected feature for most devs using 
> MarkLogic, and unlike most of the rest of ML, there are several 'gotchas' 
> (and even a defect: 19722!) which can make working with schema's a bit of a 
> pain:
>
> 1/ A schema split over several files having the same namespace will need 
> Group configuration to point to the root document for the namespace, else ML 
> will pick up a random document from the set and you may get an unexpected 
> type error.
> 2/ By default, databases share the Schemas database; this is generally a bad 
> idea, and you should probably set a separate schema database for each content 
> database.
> 3/ If you are using no-namespace schemas, you are very vulnerable to the 
> types conflicting with each other, especially if sharing schema databases.
> 4/ Due to Bug #19722, ML doesn't automatically pick up changes to schemas, 
> even worse, it can mean that it gets confused about them when they are 
> re-loaded.
>
> However, if you've got a decently typed schema, it sure saves a lot of 
> casting, and makes data integrity easier to maintain, especially with a 
> pre-commit validation trigger as suggested by Geert.
>
> Ellis.
>
> On 15 Jan 2014, at 09:56, Jakob Fix 
> <[email protected]<mailto:[email protected]>> wrote:
>
>
> hi,
>
> thanks for this. a couple of follow-up questions:
>
> - will there be support for xml schema 1.1 at some stage?
>
> - i have the impression that is very few talk about validation of documents 
> on this list. is that because people don't validate? or because it's so easy 
> that it's not worth mentioning? i'd be interested in patterns related to 
> validation people are using. validation outside of the database? what about 
> validation when a document is updated in the database, how do you assure the 
> document is still valid? xdmp:validate, schema validation? other options?
>
> On Jan 14, 2014 7:28 PM, "Mary Holstege" 
> <[email protected]<mailto:[email protected]>> wrote:
>
> I think the problem here is you are using XSD 1.1 and relying on one of
> its features.  MarkLogic currently doesn't support XSD 1.1.
>
> Technically we ought to not even attempt the validation when you have
> an xs:all extended by an xs:all, but in general MarkLogic doesn't do a
> great job
> of schema checking in that way; mostly just assuming the schemas are OK.
>
> //Mary
>
>
> On Tue, 14 Jan 2014 09:43:44 -0800, Lanz 
> <[email protected]<mailto:[email protected]>>
> wrote:
>
>> Hi all,
>>
>> Here is the context :
>> we use Marklogic 7.0-1.
>> we have a schema database containing ours schemas, this db is referenced
>> in
>> our doc db as the schema db.
>> These schemas (version 1.1) defined a base type and 2 extension types
>> (ie :
>> a basic publication as a base type and a 'summary' and an 'indicator' as
>> extension types). The extensions types have their own elements in
>> addition
>> of the ones from the basic type. Some elements could be optional or
>> mandatory, they are 'unordered' (using xs:all). All these schemas use the
>> same namespace.
>> Because the root element is the same for the 2 extension type ('work') we
>> set the attribute 'schemalocation' in the 'work' root element to be sure
>> ML
>> uses the right schema during the validation.
>> The documents have been validated against its schema in Oxygen without
>> issue
>>
>>
>> Here is the issue!
>> When we try to validate a document before inserting it in Marklogic with
>> xdmp:validate using neither strict", "lax", or "type" (with its own
>> type),
>> it failed.
>> The error message mentions the right schema but does not take in account
>> the optional elements.
>>
>> Please find the mentioned (simplified) schema, XML sample and error
>> message
>> here : https://gist.github.com/anonymous/8422411
>>
>>
>> Any help is welcome, many thanks
>> Lanz
>
>
> --
> Using Opera's revolutionary email client: http://www.opera.com/mail/
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> http://developer.marklogic.com/mailman/listinfo/general
>
>


-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Validation against schema issue

Reply via email to