Danny, I think search:search() is actually helping you out by forcing the issue of thinking about when (if ever) to limit the results. If your hardware can reasonably churn through, say, 10,000 items in an acceptable query response time, then set it to 10,000, and if the count is actually 10,000 you may return an error to indicate the user tried to analyze too many items based on their query criteria.
I've seen two alternatives to brute-force processing when computing aggregates over a large data set. First is to put a range index on the thing that you want to sum over, and use cts:sum() and/or cts:avg(). This will retrieve the values only from the in-memory range index structure, and not touch the full documents on disk. The second option is to use cts:search() or search:search() with the "random" scoring option. This actually causes the database to do a random sample of your data. For things like time per book and other aggregates it is usually best to do a random sample and compute the average on that. xdmp:estimate() will give you a fast count if your indexes fully support your query, so you can also get approximate totals by averaging a sample and extrapolating. Yours, Damon From: [email protected] [mailto:[email protected]] On Behalf Of Danny Sinang Sent: Tuesday, May 08, 2012 10:25 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] How to get max value in ML Hi Geert, We're logging user activities with regards to accessing a book (flipping through pages, annotating, etc). search:search() is used to fetch the logs that meet some reporting criteria (i.e. date range, which books, which chapters, etc). Once we get the filtered results, we feed them to a function to get the time spent by a user per book, chapter or subject. Time spent is the aggregate value. Regards, Danny On Tue, May 8, 2012 at 10:15 AM, Geert Josten <[email protected]<mailto:[email protected]>> wrote: Hi Danny, Can you elaborate on the aggregate values? That is probably quite in-efficient too. You might be better off doing such work directly with cts functions, if possible. Kind regards, Geert Van: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] Namens Danny Sinang Verzonden: dinsdag 8 mei 2012 16:10 Aan: MarkLogic Developer Discussion Onderwerp: Re: [MarkLogic Dev General] How to get max value in ML Hi Geert, Thanks. Yes, you have a good point there. Returning all those results will be inefficient. But I'll be forcing the users to limit the results by date instead and we'll be imposing a maximum date range. I just need to make sure search:search() returns all the results because I'm feeding the entire result set to a function that computes for some aggregate values. Regards, Danny On Tue, May 8, 2012 at 10:04 AM, Geert Josten <[email protected]<mailto:[email protected]>> wrote: Hi Danny, There is no pre-declared constant as far as I know if that is what you mean. But I'm sure it follows the specs of the XML Schema standard. Are you sure you want search:search to return so many results in one call? It is memory in-efficient, and showing so many results in for instance in a browser is likely to choke the browser. If you'd use a crawler that supports parallel threads, you'd see that a page size of something like 100 to 500 would work much better.. Kind regards, Geert Van: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] Namens Danny Sinang Verzonden: dinsdag 8 mei 2012 15:58 Aan: MarkLogic Developer Discussion Onderwerp: Re: [MarkLogic Dev General] How to get max value in ML Hi Geert, I'm trying to get the maximum unsigned long value to tell seach:search() to return all results. I was hoping there would be an ML function out there that would tell me the max unsigned long value. Regards, Danny On Tue, May 8, 2012 at 9:54 AM, Geert Josten <[email protected]<mailto:[email protected]>> wrote: Hi Danny, Can you elaborate on what you exactly mean? Given a sequence, you can just use fn:max() to get the highest value. If you want to determine the highest value that occurs anywhere in the database, you can use cts:values with a descending order and limit of 1. The first and single result is the highest.. Kind regards, Geert Van: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] Namens Danny Sinang Verzonden: dinsdag 8 mei 2012 15:49 Aan: general Onderwerp: [MarkLogic Dev General] How to get max value in ML Hi, Is there a function in ML that returns the maximum values for integer and unsigned long ? Regards, Danny _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
