Hi Mary,

Thanks for the clear and useful answers.  Just to follow up, we have a scenario 
where we have a custom schemas database that points to the default Schema 
database. (neither of which contains any schemas).  Would it be fair to say 
that if the schemas database is accessed every time an uncached schema is 
required for some purpose and if a schema cannot be found in the custom schemas 
database, that it will try to look for a schema in the default Schemas 
database?  We are trying to correct some performance issues. 

Thank you!

Tim

-----Original Message-----
From: Mary Holstege [mailto:[email protected]] 
Sent: Tuesday, October 23, 2012 12:22 PM
To: 'MarkLogic Developer Discussion'; Tim Meagher
Subject: Re: [MarkLogic Dev General] Creating a new schemas database

On Tue, 23 Oct 2012 08:47:36 -0700, Tim Meagher <[email protected]> wrote:

> Hi Folks,
>
> To follow up, I'd like to get a clear picture of when the schema 
> database is accessed.  It has always been my understanding that the 
> schemas database is only accessed when an explicit validation is 
> performed, but from experience we're wondering if the schemas database 
> is being accessed during ingestion or output.  For example, is the 
> schemas database changing content during serialization?
>
> If so, and if the schemas database does not point to itself, could 
> such unexpected access to the schemas database get referred to the 
> secondary schemas database even if both schema databases are empty?
>
> Thanks!
>
> :) Tim
>


The schemas database is accessed every time an uncached schema is required for 
some purpose. This purpose may be explicit validation, but it is more likely 
because it was needed to determine the typed value of a node either due to an 
explicit call to fn:data or due to implicit atomization of a value passed to a 
function or used with an operator.  The schema is also used to determine 
whitespace handling rules during parsing and serialization.

Once a schema is accessed, the schema itself is assembled into an internal data 
structure, which is cached. Type information on specific data model instances 
is also cached.

Even if a schemas database is empty, they'll still be a query run against that 
database to locate a schema, whether by location or namespace URI.

> Just to follow up on that question, is it more advantageous to have a 
> schema defined for content and how much impact does that have on 
> whether or not the content has a namespace?

There is quite a bit of overhead to processing schemas, so I wouldn't bother 
unless you have specific needs regarding whitespace or type information.  
Namespace vs non-namespace is a bit of a wash with the important caveat that 
having multiple schema (documents) for the same namespace (or non-namespace) 
essentially randomizes the automatic type assignment unless you are very 
deliberate and careful in how you set up your application: you have to make 
sure that the correct schema is the one that is chosen for every action you 
perform on your content. In practice, it is much easier to use namespaces where 
only one root schema document is relevant for any given namespace.  Unless you 
do the "poor man's namespaces"
and make sure that all your local names are distinct.

//Mary

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to