Here's what I suspect you are missing: the set operators are node-identity 
operators, not node-value operators.

http://www.w3.org/TR/xpath-functions/#func-intersect

> Summary: Constructs a sequence containing every node that occurs in the 
> values of both $parameter1 and $parameter2, eliminating duplicate nodes. 
> Nodes are returned in document order.
> 
> If either operand is the empty sequence, the empty sequence is returned.
> 
> Two nodes are duplicates if they are op:is-same-node().


That last sentence is crucial. Follow it to 
http://www.w3.org/TR/xpath-functions/#func-is-same-node

> Summary: If the node identified by the value of $parameter1 is the same node 
> as the node identified by the value of $parameter2 (that is, the two nodes 
> have the same identity), then the function returns true; otherwise, the 
> function returns false. This function backs up the "is" operator on nodes.

So we see that op:intersect is node-identity intersection. The nodes in 
"catalog" may look similar to the nodes in "catalognew", but they have separate 
node identities. In C you might think of these identities as memory locations. 
In Java you might think of them as object ids. Either way, node-identity 
operations will not help you with a de-duplication problem.

<test/> is <test/>,
<test/> = <test/>
=>
false
true

You appear to want intersection based on fn:deep-equal 
(http://www.w3.org/TR/xpath-functions/#func-deep-equal), or possibly based on 
book/@id or possibly some composite key. XQuery doesn't include that kind of 
operator, because the "value" of a complex XML fragment is a deep and ambiguous 
topic. To give you some idea of the problems you might run into, suppose that 
"catalog" has bk101 with author, then title, but "catalognew" has title, then 
author? How would you write an XQuery function that identifies duplicates under 
these circumstances?

You might find http://www.w3.org/TR/xpath-functions/#value-intersect useful. If 
book/@id isn't enough, you may need to build your own composite key from the 
values you care about, in a defined order. If so, you might also find 
http://developer.marklogic.com/pubs/4.2/apidocs/Ext-1.html#xdmp:hash64 useful.

One common approach is to build a key based on the values you care about 
(perhaps just book/@id) and make that key the document uri. Suddenly there is 
no chance of duplicates, because the document uri is a unique key within each 
MarkLogic Server database.

-- Mike

On 6 Dec 2010, at 21:36 , Mahitha T U wrote:

> Hi,
>  
>  I have following two xmls,
>  
> \test\catalog.xml
>  
> <catalog>
> <book id="bk101">
> <author>Gambardella, Matthew</author>
> <title>XML Developer's Guide</title>
> </book>
>  
> <book id="bk102">
> <author>Ralls, Kim</author>
> <title>Midnight Rain</title>
> </book>
>  
> </catalog>
>  
>  
> \ test\catalognew.xml
>  
>  
> <catalog>
> <book id="bk101">
> <author>Gambardella, Matthew</author>
> <title>XML Developer's Guide</title>
> </book>
>  
> <book id="bk103">
> <author>Corets, Eva</author>
> <title>Oberon's Legacy</title>
> </book>
>  
> </catalog>
>  
>  
> While I use the set operations intersect, difference, union on book tags of  
> these two xmls I am not getting the desire output. I was expecting a result 
> set which contain the book tag with id “bk101” while doing an intersection 
> operation. I tried in the following way which gave me the empty sequence as 
> result
>  
> let $a := doc("\test\catalog.xml")/catalog/book
> let $b := doc("\test\catalognew.xml")/catalog/book
> return
> $a intersect $b
>  
> Please let me know if I can achieve the desired output sequences using set 
> operations
>  
> Thanks & Regards
> Mahitha
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to