Hi Eliot, There were some changes made in later 4.2 releases to restore the behavior from earlier releases. The serialization is about how it is output, not how it is stored, so it should be stored correctly.
I recommend trying it on the latest 4.2 release (4.2-7 now, I think). I think it will then, by default, behave the same as in 4.1. In 4.2, there are some serialization options you can set at the query level to control this. In MarkLogic 5, you can also control these options' default values at the App Server level. Here is the 4.2 release not item that describes some of these changes: http://docs.marklogic.com/4.2doc/docapp.xqy#display.xqy?fname=http://pubs/4.2doc/xml/relnotes/chap4.xml%2340996 -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Eliot Kimber Sent: Monday, November 28, 2011 3:04 PM To: [email protected] Subject: [MarkLogic Dev General] Determining Whether Whitespace is In Data as Stored or A Result of Serialization? I have determined that content loaded through the XccRunner.load() method has unwanted whitespace not in the original XML when subsequently accessed from MarkLogic. I've tested on 4.2-1. Earlier versions do not seem to have this behavior (although I need to do more testing to confirm--but we certainly would have noticed it if we had, as from our standpoint it constitutes a data corruption issue as data being returned from ML is different from what was given to ML). I traced the DOM being loaded right to the call of load() and verified by inspection that there were no whitespace nodes between two particular elements, e.g., the original source was: <parent><child>text</child><child>text</child><parent> Accessing the loaded document using e.g.,: doc('/foo/bar/mynewdoc.xml') Results in: <parent> <child>text</child> <child>text</child> </parent> (where there is multiple whitespace before the <child> start tags and before the </parent> close tag). I tried various access routes, including CQ, access via our own product's calls to the XccRunner API, OxygenXML via WebDAV and direct XQuery (via Xcc) and get the same result. Some accesses show more indention than others, but they all have indention. >From what I could find it appears that this is the result of a change in the default serialization options. My primary question is: how can I determine how the XML is stored in ML without interference from any serialization options? Assuming the ML is not literally storing the bytes of the ML, I assume I can't just look inside the forest, but is there a reliable way to see what the original whitespace was? My first task is to prove that the ML is correct as provided to MarkLogic. My secondary questions: 1. Is there any way that options on the load() method could affect whitespace as stored? I didn't see any but I could have missed something. 2. If this is in fact a function of serialization options, where would we control that in our Java code that uses Xcc to run XQueries? Is it simply a matter of adding "declare option xdmp:output indent=no;" to our XQuery modules? 3. Is this default serialization behavior changed in ML 5? Thanks, Eliot -- Eliot Kimber Senior Solutions Architect "Bringing Strategy, Content, and Technology Together" Main: 512.554.9368 www.reallysi.com www.rsuitecms.com _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
