Danny,

I think search:search() is actually helping you out by forcing the issue of 
thinking about when (if ever) to limit the results. If your hardware can 
reasonably churn through, say, 10,000 items in an acceptable query response 
time, then set it to 10,000, and if the count is actually 10,000 you may return 
an error to indicate the user tried to analyze too many items based on their 
query criteria.

I've seen two alternatives to brute-force processing when computing aggregates 
over a large data set.

First is to put a range index on the thing that you want to sum over, and use 
cts:sum() and/or cts:avg(). This will retrieve the values only from the 
in-memory range index structure, and not touch the full documents on disk.

The second option is to use cts:search() or search:search() with the "random" 
scoring option. This actually causes the database to do a random sample of your 
data. For things like time per book and other aggregates it is usually best to 
do a random sample and compute the average on that. xdmp:estimate() will give 
you a fast count if your indexes fully support your query, so you can also get 
approximate totals by averaging a sample and extrapolating.

Yours,
Damon


From: [email protected] 
[mailto:[email protected]] On Behalf Of Danny Sinang
Sent: Tuesday, May 08, 2012 10:25 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] How to get max value in ML

Hi Geert,

We're logging user activities with regards to accessing a book (flipping 
through pages, annotating, etc).

search:search() is used to fetch the logs that meet some reporting criteria 
(i.e. date range, which books, which chapters, etc).

Once we get the filtered results, we feed them to a function to get the time 
spent by a user per book, chapter or subject.

Time spent is the aggregate value.

Regards,
Danny
On Tue, May 8, 2012 at 10:15 AM, Geert Josten 
<[email protected]<mailto:[email protected]>> wrote:
Hi Danny,

Can you elaborate on the aggregate values? That is probably quite in-efficient 
too. You might be better off doing such work directly with cts functions, if 
possible.

Kind regards,
Geert

Van: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 Namens Danny Sinang
Verzonden: dinsdag 8 mei 2012 16:10
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] How to get max value in ML

Hi Geert,

Thanks.

Yes, you have a good point there. Returning all those results will be 
inefficient.

But I'll be forcing the users to limit the results by date instead and we'll be 
imposing a maximum date range.

I just need to make sure search:search() returns all the results because I'm 
feeding the entire result set to a function that computes for some aggregate 
values.

Regards,
Danny
On Tue, May 8, 2012 at 10:04 AM, Geert Josten 
<[email protected]<mailto:[email protected]>> wrote:
Hi Danny,

There is no pre-declared constant as far as I know if that is what you mean. 
But I'm sure it follows the specs of the XML Schema standard.

Are you sure you want search:search to return so many results in one call? It 
is memory in-efficient, and showing so many results in for instance in a 
browser is likely to choke the browser. If you'd use a crawler that supports 
parallel threads, you'd see that a page size of something like 100 to 500 would 
work much better..

Kind regards,
Geert

Van: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 Namens Danny Sinang
Verzonden: dinsdag 8 mei 2012 15:58
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] How to get max value in ML

Hi Geert,

I'm trying to get the maximum unsigned long value to tell seach:search() to 
return all results.

I was hoping there would be an ML function out there that would tell me the max 
unsigned long value.

Regards,
Danny
On Tue, May 8, 2012 at 9:54 AM, Geert Josten 
<[email protected]<mailto:[email protected]>> wrote:
Hi Danny,

Can you elaborate on what you exactly mean? Given a sequence, you can just use 
fn:max() to get the highest value. If you want to determine the highest value 
that occurs anywhere in the database, you can use cts:values with a descending 
order and limit of 1. The first and single result is the highest..

Kind regards,
Geert

Van: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 Namens Danny Sinang
Verzonden: dinsdag 8 mei 2012 15:49
Aan: general
Onderwerp: [MarkLogic Dev General] How to get max value in ML

Hi,

Is there a function in ML that returns the maximum values for integer and 
unsigned long ?

Regards,
Danny

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to