RE: [MarkLogic Dev General] Surprising behavior with text nodeconstruction

Williams, Paul Fri, 14 Mar 2008 06:45:51 -0700

Ok, one more exercise then...



This may model George's situation a little closer.  This test case
produces the same results we've seen before.  So, given the spec excerpt
from Mike, the function f() appears to be returning a string rather than
a constructed text node.  Why is that?



define function f() as node() {"dummy"}

<test>

   <strings>{ for $i in 1 to 2 return f() }</strings>

   <texts>{ for $i in 1 to 2 return text { "dummy" } }</texts> </test>







-- Paul

[land] 402.592.8218

[cell]  402.203.2232



-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael
Blakeley
Sent: Friday, March 14, 2008 3:18 AM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Surprising behavior with text
nodeconstruction



We can make the test-case even shorter:



<test>

   <strings>{ for $i in 1 to 2 return "dummy" }</strings>

   <texts>{ for $i in 1 to 2 return text { "dummy" } }</texts>

</test>



=>

<test><strings>dummy dummy</strings><texts>dummydummy</texts></test>



I believe that this is the specified behavior, from

http://www.w3.org/TR/xquery/#id-content (elided for simplicity):



> 1.e.i: For each adjacent sequence of one or more atomic values
returned

> by an enclosed expression, a new text node is constructed, containing

> the result of casting each atomic value to a string, with a single
space

> character inserted between adjacent values.



That matches "strings", above.



> 3. Adjacent text nodes in the content sequence are merged into a
single

> text node by concatenating their contents, with no intervening
blanks."



And that matches "texts", above.



-- Mike



Williams, Paul wrote:

> Sorry... not an answer, just more on the question...

>

> I reduced the sample code down to what I've included below in order to

> wrap my head around this a little better.  This code shows both
results

> as George described.  But it focuses on the piece of the code that
seems

> pertinent.  Running this test produces this output...

>

> <text>

>   <strings>dummy dummy</strings>

>   <texts>dummydummy</texts>

> </text>

>

> So why doesn't the explicit text constructor version in the "texts"

> element produce the same space-joined single text node as the

> auto-constructed version in the "strings" element?

>

> The "strings" version, I would assume, produces a set of strings
first,

> then decides it needs a text node and must construct it.  The "texts"

> version, I assume, produces a set of text nodes first, then decides
they

> need to be concatenated.  But for the "strings" version to end up with

> the space, it must be converting the set of strings into a set of text

> nodes and then concatenating into one.  So why doesn't that result in

> the same output as the set of text nodes in the "texts" version?
Hmmm.

> Curious.

>

> Sample code, try this in CQ ...

> ----------------------------------------------------------------

> <test>

>   <strings>{ for $node in (<elem/>,<elem/>) return  "dummy"
}</strings>

>   <texts>{ for $node in (<elem/>,<elem/>) return  text{"dummy"}

> }</texts>

> </test>

> ----------------------------------------------------------------

>

> -- Paul

>

> -----Original Message-----

> From: [EMAIL PROTECTED]

> [mailto:[EMAIL PROTECTED] On Behalf Of

> Florentine, George

> Sent: Thursday, March 13, 2008 7:04 PM

> To: [email protected]

> Subject: [MarkLogic Dev General] Surprising behavior with text

> nodeconstruction

>

> I've run into an interesting behavior (optimization? bug?) in
MarkLogic

> and wanted to see what others thought of this.

>

> Here's the background - we have some code that dynamically generates

> content by processing DITA topics. Depending upon the structure of the

> content it's possible that our XQuery code may process two sequential

> elements that would each return a text node from a function. What we
see

> is that in this case, only one text node is returned and its value is

> the concatenation of the two string values separated by a single space

> character. This is somewhat in line with the 2003 spec

>
(http://www.w3.org/TR/2003/WD-xquery-20030502/#doc-ComputedTextConstruct

> or, section 3.7.2.4), which states:

>

> ----

> The content expression of a text node constructor is processed as

> follows:

> 1. Atomization is applied to the value of the content expression,

> converting it to a sequence of atomic values.

> 2. If the result of atomization is an empty sequence, no text node is

> constructed. Otherwise, each atomic value in the atomized sequence is

> cast into a string.

> 3. The individual strings resulting from the previous step are merged

> into a single string by concatenating them with a single space
character

> between each pair. The resulting string becomes the content of the

> constructed text node.

> -----

>

> So it appears that there's some optimization in the output generation
of

> nodes such that two sequential text nodes are collapsed into one.

>

> Below is a concrete code example. If you run the 1st code snippet in
CQ,

> the code generates the output <p>dummy dummy</p>, showing an example
of

> two calls to a function that should return two text nodes but only

> returns one text node, with the return value of each call ("dummy")

> concatenated into a single text node with a space character separating

> the two.

>

> If you run the same code (2nd snippet) with the one change that the

> return value from the function transform_dummy returns an explicitly

> created text constructor the output is <p>dummydummy</p> (no space

> character). This is the behavior I was expecting and seems like the

> right behavior. Note that the return value in function signature for
the

> transform_dummy() function is text() so I would assume that the

> xs:string "dummy" would be coerced into a text node and that a text
node

> would be returned from this function in all cases.

>

> It seems bad that this behavior is different. I'd like to get other

> perspectives on this.

>

> Thx,

>

> G

> -------------------------------

>

> Code snippet 1 - no explicit text constructor in the function

> transform_dummy, returns <p>dummy dummy</p>

> -------------------------------

>

> define function transform_default_element($element as element()) as

> node()

> {

>     (: create a new element with the same name and attributes and

> recurse to travel the subtree. :)

>     element

>      {fn:node-name($element)}

>      {$element/@*,transform_template($element/node())}

> }

> define function transform_dummy($element as element()) as text()

> {

>    "dummy"

> }

> define function transform_element ( $element as element())  as node()*

> {

>     (: branch to more specialized functions based on the type of
element

> :)

>     typeswitch ($element)

>         case element(dummy)

>             return transform_dummy($element)

>         default

>             return transform_default_element ($element)

> }

> define function transform_template ( $nodes as node()* )  as node()*

> {

>

>    for $node in $nodes

>    return

>        typeswitch($node)

>            case element()

>                return transform_element($node)

>             default

>                 (: PIs, text and comment nodes are outputted here :)

>                 return $node

>  }

>

> (: module start :)

> let $para := xdmp:unquote("<p><dummy/><dummy/></p>")

> return transform_template($para/node())

>

> -----------------------------------------

> Code snippet 2: explicit creation of text node in transform_dummy,

> returns <p>dummydummy</p>

> ------------------------------------------

>

> define function transform_default_element($element as element()) as

> node()

> {

>     (: create a new element with the same name and attributes and

> recurse to travel the subtree. :)

>     element

>      {fn:node-name($element)}

>      {$element/@*,transform_template($element/node())}

> }

> define function transform_dummy($element as element()) as text()

> {

>    (: explicitly create a text node before returning :)

>    text { "dummy" }

> }

> define function transform_element ( $element as element())  as node()*

> {

>     (: branch to more specialized functions based on the type of
element

> :)

>     typeswitch ($element)

>         case element(dummy)

>             return transform_dummy($element)

>         default

>             return transform_default_element ($element)

> }

> define function transform_template ( $nodes as node()* )  as node()*

> {

>

>    for $node in $nodes

>    return

>        typeswitch($node)

>            case element()

>                return transform_element($node)

>             default

>                 (: PIs, text and comment nodes are outputted here :)

>                 return $node

>  }

>

> (: module start :)

>

> let $para := xdmp:unquote("<p><dummy/><dummy/></p>")

> return transform_template($para/node())

>

>
------------------------------------------------------------------------

> ---

> George Florentine

>

> [EMAIL PROTECTED]

>   O:  303.542.2173

>   C:  303.669.8628

>   F:  303.544.0522

>   www.FlatironsSolutions.com

>  An Inc. 500 Company

>

>

> _______________________________________________

> General mailing list

> [email protected]

> http://xqzone.com/mailman/listinfo/general

> _______________________________________________

> General mailing list

> [email protected]

> http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Surprising behavior with text nodeconstruction

Reply via email to