Hi Danny,

I know it is processing the text node because if I force it to output some
text explicitly, then it does so, e.g.:

        case text()
                if (matches(string($node), "^\s+$)) then text {"space"}
                else $node

What I have found is that when I take the resulting node list and pass it
back to the calling function, that I have to wrap the node-list in an
element before I pass it back to the function that processes the complete
incoming node-list, which was in turn called by a function that processes a
larger incoming node-list.  Wrapping it in an element preserves the
whitespace-only text node.

I think I'm going to write up a more comprehensive example at some point,
but for now I have a solution to the problem.

I have used the common design pattern, but it does not fully satisfy the
requirements for my transform.  The design pattern assumes that the output
of the transform will follow the same element nesting as the input, but that
is not my case.

Tim

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Tuesday, March 31, 2009 12:10 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Problem
preservingwhitespaceinanXML-to-XML xquery transform

Hi Tim,

It looks to me like you are doing this to drive your recursion:

cxtn:format-source-by-node($node/following-sibling::node()[1])

I think that is effectively skipping the text node directly following the
element (because it is first going to the folowing sibling, which misses
that text node).

The common design pattern for this type of recursion is to create a second
function that passes through all of the nodes, in order, into the typeswitch
function.

Here is the example of this pattern from the Transforming XML Structures
chapter of the Developer's Guide:

xquery version "1.0-ml";
(: This function takes the children of the node and passes them
   back into the typeswitch function.  :)
declare function local:passthru($x as node()) as node()*
{
for $z in $x/node() return local:dispatch($z)
};

(: This is the recursive typeswitch function :)
declare function local:dispatch($x as node()) as node()*
{
typeswitch ($x)
  case text() return $x
  case element (bar) return <barr>{local:passthru($x)}</barr>
  case element (baz) return <bazz>{local:passthru($x)}</bazz>
  case element (buzz) return <buzzz>{local:passthru($x)}</buzzz>
  case element (foo) return <fooo>{local:passthru($x)}</fooo>
  default return <temp>{local:passthru($x)}</temp>
};

let $x := 
<foo>foo
  <bar>bar</bar>
  <baz>baz
    <buzz>buzz</buzz>
  </baz>
  foo
</foo>
return
local:dispatch($x)

-Danny

From: [email protected]
[mailto:[email protected]] On Behalf Of Tim Meagher
Sent: Tuesday, March 31, 2009 7:57 AM
To: 'General Mark Logic Developer Discussion'
Cc: 'Crewdson, Andrew'; Paul Rooney
Subject: RE: [MarkLogic Dev General] Problem preserving
whitespaceinanXML-to-XML xquery transform

I tried it, but it doesn't help in my scenario for some odd reason.

________________________________________
From: [email protected]
[mailto:[email protected]] On Behalf Of Mark
Helmstetter
Sent: Tuesday, March 31, 2009 10:53 AM
To: General Mark Logic Developer Discussion
Cc: Paul Rooney; Crewdson, Andrew
Subject: RE: [MarkLogic Dev General] Problem preserving whitespace
inanXML-to-XML xquery transform

Assuming that you're using XQuery 1.0 add:
declare boundary-space preserve;

if you're using 0.9, add:
declare xmlspace = preserve


Cheers,
Mark

________________________________________
From: [email protected]
[mailto:[email protected]] On Behalf Of Tim Meagher
Sent: Tuesday, March 31, 2009 2:26 PM
To: 'General Mark Logic Developer Discussion'
Cc: 'Crewdson, Andrew'; Paul Rooney
Subject: [MarkLogic Dev General] Problem preserving whitespace in
anXML-to-XML xquery transform
Importance: High

Hi Folks,

I have written an xquery transform whose purpose is to convert mixed content
that marked up in Schema A into the same mixed content but marked up using
Schema B.  It's not just a simple matter of renaming elements - so I have
implemented a recursive node-by-node approach using an algorithm similar to
the following:

declare function cxtn:format-source-by-node($node as node) as node()*










{










    (










     typeswitch ($node)










        case text()










            return (










                $node,










                if (exists($node/following-sibling::node()[1])) then










                   
cxtn:format-source-by-node($node/following-sibling::node()[1]) 










                else text {""}










            )










        case element (t) (: Title :)










            return (










                element title {string($node)},










                if (exists($node/following-sibling::node()[1])) then










                   
cxtn:format-source-by-node($node/following-sibling::node()[1]) 










                else text {""}










            )










        case element (v) (: Volume :)










            return (










                element volume {string($node)},










                if (exists($node/following-sibling::node()[1])) then










                   
cxtn:format-source-by-node($node/following-sibling::node()[1])










                else text {""}










            )










        case element ()










            (: Unexpected element - tag it as text and process the next node
in the list :)










            return (










                text {string($node)},










                if (exists($node/following-sibling::node()[1])) then










                   
cxtn:format-source-by-node($node/following-sibling::node()[1]}










                else text {""}










            )










        (: No other node types need to be processed here, so if encountered
just return a text node :)










        default return text {""}










    )










};






















What I'm left with is a node list that is attached to another nodelist and
eventually wrapped in a parent element, e.g.,

               element node-list { cxtn:format-source-by-node(
$some-starting-node) }
               
The results work as expected, except that when a text node is encountered
that contains only a space character, it is stripped away.

This is not boundary space, it is space between elements, so the input:

               <t>Title</t> <v>Volume</v>

is converted to:

               <title>Title</title><volume>Volume</volume>

where the space between the 2 elements has been removed.

Can someone tell me how to preserve this space?

Thank you!

Tim

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to