I have determined that content loaded through the XccRunner.load() method
has unwanted whitespace not in the original XML when subsequently accessed
from MarkLogic.

I've tested on 4.2-1. Earlier versions do not seem to have this behavior
(although I need to do more testing to confirm--but we certainly would have
noticed it if we had, as from our standpoint it constitutes a data
corruption issue as data being returned from ML is different from what was
given to ML).

I traced the DOM being loaded right to the call of load() and verified by
inspection that there were no whitespace nodes between two particular
elements, e.g., the original source was:

<parent><child>text</child><child>text</child><parent>

Accessing the loaded document using e.g.,:

doc('/foo/bar/mynewdoc.xml')

Results in:

<parent>
  <child>text</child>
  <child>text</child>
   </parent>

(where there is multiple whitespace before the <child> start tags and before
the </parent> close tag).

I tried various access routes, including CQ, access via our own product's
calls to the XccRunner API, OxygenXML via WebDAV and direct XQuery (via Xcc)
and get the same result. Some accesses show more indention than others, but
they all have indention.

>From what I could find it appears that this is the result of a change in the
default serialization options.
  
My primary question is: how can I determine how the XML is stored in ML
without interference from any serialization options? Assuming the ML is not
literally storing the bytes of the ML, I assume I can't just look inside the
forest, but is there a reliable way to see what the original whitespace was?
My first task is to prove that the ML is correct as provided to MarkLogic.

My secondary questions:

1. Is there any way that options on the load() method could affect
whitespace as stored? I didn't see any but I could have missed something.

2. If this is in fact a function of serialization options, where would we
control that in our Java code that uses Xcc to run XQueries? Is it simply a
matter of adding "declare option xdmp:output indent=no;" to our XQuery
modules?

3. Is this default serialization behavior changed in ML 5?

Thanks,

Eliot

-- 
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.reallysi.com
www.rsuitecms.com

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to