Hi Tony,
> I retrieve my data from Marklogic using XQuery, but I am
> trying to manipulate the results.
> I'm not sure xquery is the best solution and xslt might be better..
It took me some time to catch up with this thread..
Okay, you are more or less tied to XQuery. I don't think that it is more
difficult to solve your problem in XQuery than in XSLT. Ken has written nice
function that proves it isn't that big a problem. Fun thing is that the main
algorithm is formulated in Xpath 2.0. You could reuse the function bodies
literally in XSLT 2.0.
Concerning performance: I am afraid that the solutions Ken came up with were
not very optimized for large sequences. They have a logarithmic complexity. Let
me provide a solution that has a linear complexity:
declare function local:distinct-items_geert ($items as node()*) as node()* {
let $items :=
for $item in $items
order by $item/state ascending, $item/city ascending,
$item/addr ascending, $item/testVal descending
return $item
for $item at $pos in $items
let $prev := $items[$pos - 1]
where not(($item/state = $prev/state) and ($item/city = $prev/city) and
($item/addr = $prev/addr))
return
$item
};
Note that it makes much more assumptions about the input, and the items are
sorted allong the way. That might be not what you want.
I also still think it should be possible to optimize even further utilizing
indexes. In plain old XSLT I would have created an xsl:key index with a concat
of state, city and addr as key; using an altered Muenchian method would have
given you the unique items. It would result in an optimized version of Ken's
strategy more or less.
In XQuery, or to be more precise in MarkLogic Server, I would attempt to
utilize the index Lexicon functions. The easiest approach would be to add a
'sortkey' attribute on each item (containing a concat of state, city and addr),
adding a range index on it. Once you have this index, you can use the lexicon
function cts:element-attribute-values to get the unique state/city/addr
combinations, and search on each distinct value (a search on the sortkey
attribute) to retrieve a list of items on the same address from the index,
after which you apply the algorithm you like to get the most favorable one..
HTH!
Kind regards,
Geert
drs. G.P.H. (Geert) Josten
Consultant
Daidalos BV
Hoekeindsehof 1-4
2665 JZ Bleiswijk
T +31 (0)10 850 1200
F +31 (0)10 850 1199
mailto:[email protected]
http://www.daidalos.nl/
KvK 27164984
P Please consider the environment before printing this mail.
De informatie - verzonden in of met dit e-mailbericht - is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit
bericht kunnen geen rechten worden ontleend.
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general