Hi Tony,

> I retrieve my data from Marklogic using XQuery, but I am
> trying to manipulate the results.
> I'm not sure xquery is the best solution and xslt might be better..

It took me some time to catch up with this thread..

Okay, you are more or less tied to XQuery. I don't think that it is more 
difficult to solve your problem in XQuery than in XSLT. Ken has written nice 
function that proves it isn't that big a problem. Fun thing is that the main 
algorithm is formulated in Xpath 2.0. You could reuse the function bodies 
literally in XSLT 2.0.

Concerning performance: I am afraid that the solutions Ken came up with were 
not very optimized for large sequences. They have a logarithmic complexity. Let 
me provide a solution that has a linear complexity:

declare function local:distinct-items_geert ($items as node()*) as node()* {
        let $items :=
                for $item in $items
                order by $item/state ascending, $item/city ascending, 
$item/addr ascending, $item/testVal descending
                return $item

        for $item at $pos in $items
        let $prev := $items[$pos - 1]
        where not(($item/state = $prev/state) and ($item/city = $prev/city) and 
($item/addr = $prev/addr))
        return
                $item
};

Note that it makes much more assumptions about the input, and the items are 
sorted allong the way. That might be not what you want.

I also still think it should be possible to optimize even further utilizing 
indexes. In plain old XSLT I would have created an xsl:key index with a concat 
of state, city and addr as key; using an altered Muenchian method would have 
given you the unique items. It would result in an optimized version of Ken's 
strategy more or less.

In XQuery, or to be more precise in MarkLogic Server, I would attempt to 
utilize the index Lexicon functions. The easiest approach would be to add a 
'sortkey' attribute on each item (containing a concat of state, city and addr), 
adding a range index on it. Once you have this index, you can use the lexicon 
function cts:element-attribute-values to get the unique state/city/addr 
combinations, and search on each distinct value (a search on the sortkey 
attribute) to retrieve a list of items on the same address from the index, 
after which you apply the algorithm you like to get the most favorable one..

HTH!

Kind regards,
Geert



drs. G.P.H. (Geert) Josten
Consultant


Daidalos BV
Hoekeindsehof 1-4
2665 JZ Bleiswijk

T +31 (0)10 850 1200
F +31 (0)10 850 1199

mailto:[email protected]
http://www.daidalos.nl/

KvK 27164984

P Please consider the environment before printing this mail.
De informatie - verzonden in of met dit e-mailbericht - is afkomstig van 
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit 
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit 
bericht kunnen geen rechten worden ontleend.

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to