When the cost of doing an estimation is high, the function
ibis::query::estimate gives back the default answer of 0 as min and N
as max (where N is the number of rows in the data partition).
Technically, this is a correct answer, thought it might not be what
you expect.
In your case, since N = 100, which is very small, the overhead of
reading the index into memory or reading the header of the index into
memory is as high as reading all values (which takes up only 400
bytes) into memory. The decision of whether the cost of high not is
compared to reading the raw data values. I imagine that the cost
decision is likely to favor reading all N values.
Depending on what you want to do with the results of
ibis::query::estimate, you might have to call ibis::query::evaluate
instead.
John
On 6/12/13 9:17 AM, nan zhou wrote:
> Hi, John,
>
> Thanks for the reply. I did retrieve the min and the max hits. But the
> return values is *0* for the /getMinNumHits/ and *100* for
> the /getMaxNumHits/. 100 is the total number of records.
>
> I am expecting the returned min hits at least is *8* and max hits at
> least is *15* for the query which has where clause ( data < 15 ) and
> data which has following distribution of each bin.
>
> Records distribution for each bin:
>> > value range | # of element locates in this range
>> > [0 - 10) | 8
>> > [10 - 20) | 7 // our query touches these
>> > two bins
>> > [20 - 30) | 12
>> > [30 - 40) | 11
>> > [40 - 50) | 10
>> > [50 - 60) | 9
>> > [60 - 70) | 15
>> > [70 - 80) | 10
>> > [80 - 90) | 7
>> > [90 - 100) | 11
>
>
> Please see below for the codes I am using:
>
> /estimate_query.setWhereClause ("data < 15");//
> //estimate_query.getHitRows (RIDs);//
> //
> //uint64_t min_hits = estimate_query.getMinNumHits ();//
> //uint64_t max_hits = estimate_query.getMaxNumHits ();//
> //uint32_t estimate_size = RIDs.size ();
>
> /Output:/
> //>>> where data < 15: estimate() *returned 0 records between minimum
> 0 and maximum 100 hits.*/
> /*
> */
> /*Thanks, */
> /*
> */
> /*Nan
> */
>> Date: Tue, 11 Jun 2013 23:05:09 -0700
>> From: [email protected]
>> To: [email protected]
>> CC: [email protected]
>> Subject: Re: [FastBit-users] How to enable fastbit to answer the
> query without touching raw data
>>
>> The documentation of ibis::query::estimate states that
>>
>> Returns 0 for success, a negative value for error.
>>
>> Since the function call was completed correctly, it should have
>> returned 0. To find out the minimum and maximum number of hits
>> determined by ibis::query::estimate, you need to call
>> ibis::query::getMinNumHits and ibis::query::getMaxNumHits. You can
>> see an example of how they are used in examples/ibis.cpp line 3549 and
>> 3550.
>>
>> John
>>
>>
>> On 6/11/13 2:50 PM, nan zhou wrote:
>> > Hello,
>> >
>> > Sorry to send this email again, I realized that the email is not
>> > sent to fastbit user mailing list. Following is my problem.
>> >
>> > I tried the estimate function as you instructed before, however I
>> > got a wrong answer from estimate function (FastBit version is 1.3.6).
>> > Could you help me ?
>> >
>> > I have data which has following distribution:
>> > value range | # of element locates in this range
>> > [0 - 10) | 8
>> > [10 - 20) | 7
>> > [20 - 30) | 12
>> > [30 - 40) | 11
>> > [40 - 50) | 10
>> > [50 - 60) | 9
>> > [60 - 70) | 15
>> > [70 - 80) | 10
>> > [80 - 90) | 7
>> > [90 - 100) | 11
>> > Above data was binned into 4 bins, whose boundaries are "10, 40,
>> > 70, 100".
>> >
>> > I applied estimate function when the query is " xxx where data
>> > value < 15 ", the estimate function return 0, which is not right.
>> > If i use evaluate function given by same query, the results number
>> > is 15 which is correct.
>> >
>> > Here is my code :
>> >
>> > vector <uint32_t> RIDs;
>> >
>> > ibis::part table ("test", static_cast<const char*>(0));
>> >
>> > // create a query object with the current user name.
>> > ibis::query estimate_query (ibis::util::userName(), &table);
>> > ibis::query evaluate_query (ibis::util::userName(), &table);
>> >
>> > evaluate_query.setWhereClause ("data < 15");
>> > assert (evaluate_query.evaluate () >= 0);
>> > evaluate_query.getHitRows (RIDs);
>> >
>> > uint32_t evaluate_size = RIDs.size ();
>> >
>> > cout << "number of records where data < 15: evaluate() = " <<
>> > evaluate_size << " records." << endl; *// here it returns 15*
>> >
>> > RIDs.clear ();
>> >
>> > estimate_query.setWhereClause ("data < 15");
>> > estimate_query.getHitRows (RIDs);
>> >
>> > uint64_t min_hits = estimate_query.getMinNumHits ();
>> > uint64_t max_hits = estimate_query.getMaxNumHits ();
>> > uint32_t estimate_size = RIDs.size ();
>> >
>> > cout << "number of records where data < 15: estimate() = " <<
>> > estimate_size << " records between " << min_hits << " and " <<
>> > max_hits << " hits." << endl; *// value of variable estimate_size
>> > is 0 , and min_hits = 0, and max_hits = 100*
>> >
>> > Any clue why it is not returning the right value? Thanks
>> >
>> > Nan
>> >
>> >
>> > ----------------------------------------------------------------------
>> > From: [email protected]
>> > To: [email protected]
>> > Subject: RE: [FastBit-users] How to enable fastbit to answer the query
>> > without touching raw data
>> > Date: Thu, 9 May 2013 22:35:58 +0800
>> >
>> > Thank you very much.
>> >
>> > nan
>> >
>> >> Date: Wed, 8 May 2013 14:52:31 -0700
>> >> From: [email protected]
>> >> To: [email protected]
>> >> CC: [email protected]
>> >> Subject: Re: [FastBit-users] How to enable fastbit to answer the
>> > query without touching raw data
>> >>
>> >> Yes, your understanding is correct.
>> >>
>> >> John
>> >>
>> >>
>> >> On 5/8/13 1:38 PM, nan zhou wrote:
>> >> > Hi, John,
>> >> >
>> >> > Further question would be how the `estimate` function works. For
>> >> > example, if I have bin boundaries, such as: 0, 10 , 20, 30, 40, and
>> >> > 50 , six bin boundaries for column A( bin 1: [0, 10), bin 2:
> [10, 20),
>> >> > bin 3: [20, 30), bin 4 [30, 40), bin 5 [40, 50) ) . The where
> clause
>> >> > has 21<= A <= 35. In such as, all bit positions/RIDs in bin 3
> and bin
>> >> > 4 are retrieved, no matter whether the actual value is in the query
>> >> > range or not. Do I understand it correctly?
>> >> >
>> >> > Thanks.
>> >> >
>> >> > nan
>> >> >
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users