Hi,
I'm trying to simulate a SQL SELECT ... Group By functionality via an ML
xquery script.
So I wrote a searchLogs function that uses search:search to return a list
of logs that match some given filters. So far, it runs very fast.
For the Group By part, I was able to write something that works but it's
turning out to be very slow. When searchLog returns 2,500 rows, it takes 1
minute or so to generate group totals.
The code is a bit complex as it handles multi-column groupBy's recursively.
But the basic logic goes like this :
let $results := local:seachLogs($searchQuery)
let $userIds := fn:distinct-values($results/userId)
for $userId in $userIds
let $userLogs := $results[userId=$userId]
let $userTotal := fn:count($userLogs)
return
<userId>{$userId}</userId>
<userTotal>{$userTotal}</userTotal>
<book> {
let $bookIds := fn:distinct-values($userLogs)
for $bookId in $bookIds
let $bookLogs := $userLogs[bookId=$bookId]
let $bookTotal := fn:count($bookLogs)
return
<bookId>{$bookId}</bookId>
<bookTotal>{$bookTotal}</bookTotal>
} </book>
I did some crude timings via xdmp:log() and saw that the red lines above
eat up like 30 milliseconds for each user / book row. With thousands of
rows to be processed, the delays all add up and become noticeable.
Can anybody here suggest a way to speed this thing up dramatically ?
If not, I'm thinking of inserting the raw results into an SQL table and
letting SQL do the group totals.
Regards,
Danny
_______________________________________________
General mailing list
[email protected]
http://community.marklogic.com/mailman/listinfo/general