Hi Mary, Thanks for the clear and useful answers. Just to follow up, we have a scenario where we have a custom schemas database that points to the default Schema database. (neither of which contains any schemas). Would it be fair to say that if the schemas database is accessed every time an uncached schema is required for some purpose and if a schema cannot be found in the custom schemas database, that it will try to look for a schema in the default Schemas database? We are trying to correct some performance issues.
Thank you! Tim -----Original Message----- From: Mary Holstege [mailto:[email protected]] Sent: Tuesday, October 23, 2012 12:22 PM To: 'MarkLogic Developer Discussion'; Tim Meagher Subject: Re: [MarkLogic Dev General] Creating a new schemas database On Tue, 23 Oct 2012 08:47:36 -0700, Tim Meagher <[email protected]> wrote: > Hi Folks, > > To follow up, I'd like to get a clear picture of when the schema > database is accessed. It has always been my understanding that the > schemas database is only accessed when an explicit validation is > performed, but from experience we're wondering if the schemas database > is being accessed during ingestion or output. For example, is the > schemas database changing content during serialization? > > If so, and if the schemas database does not point to itself, could > such unexpected access to the schemas database get referred to the > secondary schemas database even if both schema databases are empty? > > Thanks! > > :) Tim > The schemas database is accessed every time an uncached schema is required for some purpose. This purpose may be explicit validation, but it is more likely because it was needed to determine the typed value of a node either due to an explicit call to fn:data or due to implicit atomization of a value passed to a function or used with an operator. The schema is also used to determine whitespace handling rules during parsing and serialization. Once a schema is accessed, the schema itself is assembled into an internal data structure, which is cached. Type information on specific data model instances is also cached. Even if a schemas database is empty, they'll still be a query run against that database to locate a schema, whether by location or namespace URI. > Just to follow up on that question, is it more advantageous to have a > schema defined for content and how much impact does that have on > whether or not the content has a namespace? There is quite a bit of overhead to processing schemas, so I wouldn't bother unless you have specific needs regarding whitespace or type information. Namespace vs non-namespace is a bit of a wash with the important caveat that having multiple schema (documents) for the same namespace (or non-namespace) essentially randomizes the automatic type assignment unless you are very deliberate and careful in how you set up your application: you have to make sure that the correct schema is the one that is chosen for every action you perform on your content. In practice, it is much easier to use namespaces where only one root schema document is relevant for any given namespace. Unless you do the "poor man's namespaces" and make sure that all your local names are distinct. //Mary _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
