Re: [MarkLogic Dev General] Creating a new schemas database

Fernandes, Nivaldo Fri, 26 Oct 2012 08:11:44 -0700

Thanks Mary (and Michael).

I have a couple of points for further clarification...

Mary said: " Yes, when we parse XML documents, we will add defaulted
attributes to the
data model. You can control whether those show up in the serialization
or
not, because there are customers who want it one way or the other.

Point 1
-------
As for the addition of defaulted attributes, does this also apply to
*optional* ones? 

Here is an example:
Original doc in file system:
<ISBN>some_isbn_value</ISBN>
Doc after ingestion in MarkLogic: 
<ISBN Type="Set">some_isbn_value</ISBN>
So, this is clearly not what we want. Is MarkLogic perhaps assuming that
a Fixed value in an attribute makes it a required attribute?
Here is the schema definition for the attribute in question:
<xs:attribute fixed="Set" name="Type" type="xs:string"/>
So, it is OPTIONAL, otherwise its definition would have been:
<xs:attribute fixed="Set" name="Type" type="xs:string" use="required"/>

So, I believe we need some clarification here. 

Point 2
-------
As for the control during serialization, this has implications for the
ticket I mentioned. And, sorry, my bad, this ticket (#10661) is no
longer open, but according to my co-worker, your statement regarding
control is significant...it was not clear to him that this was possible
from the ticket responses. 
Here is his observation:
"We found that whitespace-only text nodes were being added to our docs,
in a few specific places, at serialization. The docs in question were
not namespaced, and we eventually determined that some unrelated schemas
in our Schemas db were interacting with these docs, apparently adding
the unexpected whitespace upon serialization. At the time, as recorded
in the ticket (10661), we solved the problem by circumventing the
Schemas db (pointing the main db to itself for its 'Schemas database'
setting). We did not understand that there would be another way to
disable the whitespace behavior while still leaving the Schemas setup as
it was, and would be interested to know more about that. There is some
discussion in the ticket of setting 'output indent' and 'boundary-space'
options, but it appears that these did not address the problem."

So, we are eager to hear from you on this as well.

Thanks!

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Mary
Holstege
Sent: Thursday, October 25, 2012 3:54 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Creating a new schemas database

On Thu, 25 Oct 2012 11:40:33 -0700, Fernandes, Nivaldo
<[email protected]> wrote:

> I would like to extend this thread just a bit more...I have concerns
> about the background checking against schemas. I understand the
> technical reasons for type checking but there seems to be some
wrinkles
> in the practical outcome.
>
> But to make sure that I understand the jist of your responses to this
> thread: if the Schemas db has a schema document with the same
namespace
> as the xml documents in a db that points to said Schemas db, then
> operations on the xml documents that require type checking (e.g.
> fn:data, and many others) will cause MarkLogic to do an IMPLICIT
> verification/check against the schema document.

It isn't doing an implicit validation, it is doing an implicit type
assignment of the specific data you are trying to perform a typed
operation on. This will entail a certain amount of propogation of
that assessment up the document tree if you have local elements.
It doesn't process the whole tree and it doesn't check complex
type validity: it assumes complex type validity.

> So, if for some reason (see below) I do not want this background
> checking done when operating on the xml documents, is my only choice
> then NOT to have any schema documents (with same namespace as xml doc)
> in the Schemas db? Sounds fair BUT what if I still want to be able to
do
> EXPLICIT validation against the schema??? [BTW, this is how I
understood
> things to be with MarkLogic, especially since it claims to be quite
> functional in a Schema-agnostic world.]
>
> In summary, my understanding was that schema validation was totally
> under my control via an EXPLICIT call to validate.

Schema validation, yes. Type assessment, less so.

You can't disable automatic type assessment, but you can make it
essentially a no-op by making sure you explicitly refer to a dummy
schema for that namespace.

I wouldn't recommend this, however. It is tricky to get right, and
there really is not a good reason to not want to use the correct types
for typed operations.

> So, what is my reason for not wanting the implicit validation? Well,
> during a high stress period in my organization, when we reload all our
> databases, I found myself staring at documents being ingested in
> MarkLogic (4.1-7.1) that were mysteriously having an attribute being
> added to them upon ingestion, even though I made sure that nowhere in
> the loading this was explicitly happening. After cracking my head for
a
> while, I had the realization to look at the schema in the Schemas
> database being pointed to in the db config, and saw that the attribute
> being added was an *optional* attribute in the schema with a Fixed
value
> (i.e. this attribute may not occur but when it does, it always has the
> same value). My next step was to remove the schema document from the
> schema database in order to eliminate the remote possibility that
> MarkLogic was doing some background schema validation (WHICH NOW I
KNOW
> IT DOES). To my surprise (and dismay) at the time, the problem was
> solved by removing the schema document...no longer the attribute was
> being incorrectly added to the elements in the xml documents. And by
> golly, no longer was I going to put any schema documents in the
Schemas
> database and go through some similar bad experience.
>
> NOTE: similar unwanted interactions between schema and xml documents
> have been experienced by other developers in my organization (ML
4.2-9)
> (with current tickets opened yet still unresolved).

I am not aware of any open tickets or bugs in this area. That doesn't
mean there aren't any, mind you, but I couldn't find them.

But I think these are a bit of a misperception of what is happening
here.
Yes, when we parse XML documents, we will add defaulted attributes to
the
data model. You can control whether those show up in the serialization
or
not, because there are customers who want it one way or the other.

We need the defaulted attributes in the internal data model because many
things would otherwise not work correctly, such as the processing the
XML Schema documents themselves, or of XSLT stylesheets.

> So, what can be done here? Should MarkLogic perhaps offer a switch in
> its db config page that allows us to NOT want background schema
> validation and avoid its bad side effects? Or?
>
> Please advise.

//Mary
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Creating a new schemas database

Reply via email to