Re: [MarkLogic Dev General] Applying schemas to schema-less documents

Mary Holstege Fri, 24 Jun 2011 08:19:27 -0700

On Fri, 24 Jun 2011 08:05:41 -0700, Lee, David <[email protected]> wrote:

> Thanks for this info.
> Question:  Is there anything that the initial load or indexing of  
> documents uses the schema ?
> That is, is there a performance/indexing advantage to pre-declaring the  
> schema at load time as apposed to importing it in a given xquery.
>
> For example I could imagine indexes indexing date types differently than  
> string types - but it would have to know at the time the document was  
> loaded.
>

Just to reemphasize one thing: the server will only use the first schema
it finds if you don't tell it where to look.  You can always force
the issue with a schema import statement in the query or a schema
location on the document.  The only thing you need to be careful
about is how relative schema locations are resolved.  Suggestion:
use an absolute path (e.g. /my/schema/here.xsd rather than just
here.xsd) or even an absolute URI (e.g. http://whatever/my/schema/here.xsd)
Note that this is a URI in the schemas database; we don't dereference
against the web.

Absent an import or a schema location, then yes, we will use the
first schema for the namespace (or non-namespace) that we find.
XQuery doesn't really treat non-namespaced vocabularies as equal
citizens, however, so the advice to use namespaces is still a good one.

We will automatically make use of schema information in a limited
was during load: we don't persist any type information, but we
do use it to decide how to handle white space and, if you are
using repair, what and how to do that repairing.  Since the schema
has to be in the schemas database to be in scope for a particular
query, it will in general also be in scope for the load. I would
definitely DISrecommend a practice of trying to dynamically load
and unload schemas for particular queries.  The schema-related
type processing is fairly expensive, so we do a lot of caching,
so you'll just be slowing things down a lot.  And, for completeists'
sake, we also use schema information when serializing to
decide how to pretty print (again, whitespace handling).

//Mary

>
>
> ----------------------------------------
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> [email protected]
> 812-482-5224
>
>
> -----Original Message-----
> From: [email protected]  
> [mailto:[email protected]] On Behalf Of Michael  
> Blakeley
> Sent: Friday, June 24, 2011 10:37 AM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Applying schemas to schema-less  
> documents
>
> The server applies schemas dynamically, looking at whatever happens to  
> be available. So yes it's possible. However that can lead to a problem:  
> if I have a node 'run/elapsed', I might have two different schemas that  
> contain an element 'run'. The server will use the first one it finds.
>
> Because of this I strongly recommend using namespaces and schemas  
> together. A schema that targets {mynamespace}run is much less likely to  
> conflict with some other schema. You can also disambiguate which schema  
> you want using an 'import schema...' prolog expression.
>
> Let's see... there used to be a quick schema tutorial on the developer  
> site. Here it is: http://developer.marklogic.com/learn/2007-04-schema
>
> -- Mike
>
> On 24 Jun 2011, at 07:28 , Lee, David wrote:
>
>> Suppose I have a bunch of documents with no namespace and no schemas.
>> I would like to apply a schema to these documents so that
>>
>> 1) I can avoid lots of casting in my xquery such as
>>                 fn:avg(($runs[@status eq 'true']/@elapsed)
>> instead of
>>                 fn:avg( xs:dayTimeDuration($runs[@status eq  
>> 'true']/@elapsed ))
>>
>> 2) So that indexes are built knowing about the element and attribute  
>> types so that things like
>>                 fn:max( xdmp:directory("/logs/" )//run/@elapsed )
>> can go through an index instead and be sorted correctly (by  
>> dateTimeDuration instead of by string
>>
>> Is this possible ?
>> If so how ?
>>
>> Thanks for suggestions and pointers to RTFM's
>>
>>
>>
>>
>> ----------------------------------------
>> David A. Lee
>> Senior Principal Software Engineer
>> Epocrates, Inc.
>> [email protected]
>> 812-482-5224
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Applying schemas to schema-less documents

Reply via email to