Hi Damon,
Thanks. My script is generating group totals a lot faster now.
However, local:categorize() takes 3 seconds to load 25,000 XML docs into
the map.
let $groupMap := map:map()
let $_ := for $res in $logs
return local:categorize($groupMap, $res/book/bookId, $res)
This is just 3 days' worth of user logs.
We're expecting our customer to query a month's worth of logs at a time.
Is there an even faster solution ?
Regards,
Danny
On Fri, Jun 22, 2012 at 11:45 PM, Damon Feldman <[email protected]
> wrote:
>
>
> Danny,
>
>
>
> It looks like this one is also suffering from use of predicates. They get
> evaluated for every item in the sequence, so you get O(n^2) behavior when
> they are in a for loop. This is one of the few places I would suggest
> using procedural programming by adding values to a map:map structure.
> Here's some sample code that runs in linear time:
>
>
>
>
> declare function local:categorize($map, $key, $value) {
> let $new-seq := (map:get($map, $key), $value)
> return map:put($map, $key, $new-seq)
> };
>
>
>
> let $categorized := map:map()
> let $items := for $i in 1 to 1000 return <item idx="{$i}">{if
> (xdmp:random(1) eq 1) then "one" else "zero"}</item>
> let $add-all :=
> for $item in $items
> return local:categorize($categorized, $item/text(), $item)
>
> return
> <groups>
> <ones>{map:get($categorized, "one")}</ones>
> <zeros>{map:get($categorized, "zero")}</zeros>
> </groups>
>
>
>
> Use map:keys($map) to get all "groups."
>
>
>
> Yours,
> Damon
>
>
> ------------------------------
> *From:* [email protected] [
> [email protected]] On Behalf Of Danny Sinang [
> [email protected]]
> *Sent:* Friday, June 22, 2012 9:08 PM
> *To:* general
> *Subject:* [MarkLogic Dev General] Faster way to simulate SQL Group By in
> ML xquery
>
> Hi,
>
> I'm trying to simulate a SQL SELECT ... Group By functionality via an ML
> xquery script.
>
> So I wrote a searchLogs function that uses search:search to return a
> list of logs that match some given filters. So far, it runs very fast.
>
> For the Group By part, I was able to write something that works but it's
> turning out to be very slow. When searchLog returns 2,500 rows, it takes 1
> minute or so to generate group totals.
>
> The code is a bit complex as it handles multi-column groupBy's
> recursively. But the basic logic goes like this :
>
> let $results := local:seachLogs($searchQuery)
> let $userIds := fn:distinct-values($results/userId)
> for $userId in $userIds
> let $userLogs := $results[userId=$userId]
> let $userTotal := fn:count($userLogs)
> return
> <userId>{$userId}</userId>
> <userTotal>{$userTotal}</userTotal>
> <book> {
> let $bookIds := fn:distinct-values($userLogs)
> for $bookId in $bookIds
> let $bookLogs := $userLogs[bookId=$bookId]
> let $bookTotal := fn:count($bookLogs)
> return
>
> <bookId>{$bookId}</bookId>
> <bookTotal>{$bookTotal}</bookTotal>
>
>
> } </book>
>
> I did some crude timings via xdmp:log() and saw that the red lines above
> eat up like 30 milliseconds for each user / book row. With thousands of
> rows to be processed, the delays all add up and become noticeable.
>
> Can anybody here suggest a way to speed this thing up dramatically ?
>
> If not, I'm thinking of inserting the raw results into an SQL table and
> letting SQL do the group totals.
>
> Regards,
> Danny
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://community.marklogic.com/mailman/listinfo/general
>
>
_______________________________________________
General mailing list
[email protected]
http://community.marklogic.com/mailman/listinfo/general