Thanks Mary (and Michael). I have a couple of points for further clarification...
Mary said: " Yes, when we parse XML documents, we will add defaulted attributes to the data model. You can control whether those show up in the serialization or not, because there are customers who want it one way or the other. Point 1 ------- As for the addition of defaulted attributes, does this also apply to *optional* ones? Here is an example: Original doc in file system: <ISBN>some_isbn_value</ISBN> Doc after ingestion in MarkLogic: <ISBN Type="Set">some_isbn_value</ISBN> So, this is clearly not what we want. Is MarkLogic perhaps assuming that a Fixed value in an attribute makes it a required attribute? Here is the schema definition for the attribute in question: <xs:attribute fixed="Set" name="Type" type="xs:string"/> So, it is OPTIONAL, otherwise its definition would have been: <xs:attribute fixed="Set" name="Type" type="xs:string" use="required"/> So, I believe we need some clarification here. Point 2 ------- As for the control during serialization, this has implications for the ticket I mentioned. And, sorry, my bad, this ticket (#10661) is no longer open, but according to my co-worker, your statement regarding control is significant...it was not clear to him that this was possible from the ticket responses. Here is his observation: "We found that whitespace-only text nodes were being added to our docs, in a few specific places, at serialization. The docs in question were not namespaced, and we eventually determined that some unrelated schemas in our Schemas db were interacting with these docs, apparently adding the unexpected whitespace upon serialization. At the time, as recorded in the ticket (10661), we solved the problem by circumventing the Schemas db (pointing the main db to itself for its 'Schemas database' setting). We did not understand that there would be another way to disable the whitespace behavior while still leaving the Schemas setup as it was, and would be interested to know more about that. There is some discussion in the ticket of setting 'output indent' and 'boundary-space' options, but it appears that these did not address the problem." So, we are eager to hear from you on this as well. Thanks! -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mary Holstege Sent: Thursday, October 25, 2012 3:54 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Creating a new schemas database On Thu, 25 Oct 2012 11:40:33 -0700, Fernandes, Nivaldo <[email protected]> wrote: > I would like to extend this thread just a bit more...I have concerns > about the background checking against schemas. I understand the > technical reasons for type checking but there seems to be some wrinkles > in the practical outcome. > > But to make sure that I understand the jist of your responses to this > thread: if the Schemas db has a schema document with the same namespace > as the xml documents in a db that points to said Schemas db, then > operations on the xml documents that require type checking (e.g. > fn:data, and many others) will cause MarkLogic to do an IMPLICIT > verification/check against the schema document. It isn't doing an implicit validation, it is doing an implicit type assignment of the specific data you are trying to perform a typed operation on. This will entail a certain amount of propogation of that assessment up the document tree if you have local elements. It doesn't process the whole tree and it doesn't check complex type validity: it assumes complex type validity. > So, if for some reason (see below) I do not want this background > checking done when operating on the xml documents, is my only choice > then NOT to have any schema documents (with same namespace as xml doc) > in the Schemas db? Sounds fair BUT what if I still want to be able to do > EXPLICIT validation against the schema??? [BTW, this is how I understood > things to be with MarkLogic, especially since it claims to be quite > functional in a Schema-agnostic world.] > > In summary, my understanding was that schema validation was totally > under my control via an EXPLICIT call to validate. Schema validation, yes. Type assessment, less so. You can't disable automatic type assessment, but you can make it essentially a no-op by making sure you explicitly refer to a dummy schema for that namespace. I wouldn't recommend this, however. It is tricky to get right, and there really is not a good reason to not want to use the correct types for typed operations. > So, what is my reason for not wanting the implicit validation? Well, > during a high stress period in my organization, when we reload all our > databases, I found myself staring at documents being ingested in > MarkLogic (4.1-7.1) that were mysteriously having an attribute being > added to them upon ingestion, even though I made sure that nowhere in > the loading this was explicitly happening. After cracking my head for a > while, I had the realization to look at the schema in the Schemas > database being pointed to in the db config, and saw that the attribute > being added was an *optional* attribute in the schema with a Fixed value > (i.e. this attribute may not occur but when it does, it always has the > same value). My next step was to remove the schema document from the > schema database in order to eliminate the remote possibility that > MarkLogic was doing some background schema validation (WHICH NOW I KNOW > IT DOES). To my surprise (and dismay) at the time, the problem was > solved by removing the schema document...no longer the attribute was > being incorrectly added to the elements in the xml documents. And by > golly, no longer was I going to put any schema documents in the Schemas > database and go through some similar bad experience. > > NOTE: similar unwanted interactions between schema and xml documents > have been experienced by other developers in my organization (ML 4.2-9) > (with current tickets opened yet still unresolved). I am not aware of any open tickets or bugs in this area. That doesn't mean there aren't any, mind you, but I couldn't find them. But I think these are a bit of a misperception of what is happening here. Yes, when we parse XML documents, we will add defaulted attributes to the data model. You can control whether those show up in the serialization or not, because there are customers who want it one way or the other. We need the defaulted attributes in the internal data model because many things would otherwise not work correctly, such as the processing the XML Schema documents themselves, or of XSLT stylesheets. > So, what can be done here? Should MarkLogic perhaps offer a switch in > its db config page that allows us to NOT want background schema > validation and avoid its bad side effects? Or? > > Please advise. //Mary _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
