Hi, Nan,

The unexpected result indeed was caused by the evaluate function's
attempt to avoid expensive estimation operation.  I have added a
conditional compilation macro FASTBIT_ESTIMATION_IGNORE_COST to SVN
Revision 643.  If you add '-DFASTBIT_ESTIMATION_IGNORE_COST' to the
compilation flags CXXFLAGS, you will be disable the computation of the
estimation cost and cause the estimate procedure to be run regardless
of the expected cost.  This should be the behavior you are expecting.
 Let me know how it goes for you after you get a chance to try it out.

John


On 6/17/13 6:50 AM, nan zhou wrote:
> Hi, John, 
> 
>    Thanks for offering help. 
> 
>   Attachment is my  test program.  estimate_query.cpp is the source
> code which calls the estimate function.  -part.txt is included, and
> bin.txt has a list of bin boundaries. 
>   In case the mail system will filter .sh file, I change the name
> run.sh to run.sh.rename. Please change it back if you want to run it
> (modify path to fastbit as well). 
>   Thanks again. 
> 
> Nan
> 
>> Date: Wed, 12 Jun 2013 23:48:48 -0700
>> From: [email protected]
>> To: [email protected]
>> CC: [email protected]
>> Subject: Re: [FastBit-users] How to enable fastbit to answer the
> query without touching raw data
>>
>> Hi, Nan,
>>
>> Oh, well, I am not as sure about what is going on as I was earlier
>> today. If you are willing to pack up the smaller test problem you
>> were using, I could find time to look into a little bit more..
>>
>> John
>>
>>
>> On 6/12/13 7:36 PM, nan zhou wrote:
>> > Hi, John
>> >
>> > Much appreciate for your time and help, please forgive me if I ask
>> > too much.
>> >
>> > I expanded the total number of values to 20,000,000. And the
>> > estimate function still give min hits as 0, and max hits as 20,
>> > 000,000 by using same query ( details is at the end) . Could you
>> > explain a little bit on how the FastBit to do estimate, like how it
>> > calculate cost, how to decide to load the index or raw data? Maybe, I
>> > missed other important thing during index build.
>> >
>> > Thank you very much
>> >
>> > /./estimate_query /
>> > /number of records where data < 15: evaluate() = *3998670* records./
>> > /number of records where data < 15: estimate() = *0* records between 0
>> > and *20000000* hits./
>> > /
>> > /
>> > /--------------------- Histogram ------------------------------------/
>> > /0 to 10 has 1999176 elements/
>> > /10 to 40 has 5999627 elements/
>> > /40 to 70 has 5999200 elements/
>> > /70 to 100 has 6001997 elements/
>> > /100 to 1.79769e+308 has 0 elements/
>> > /--------------------------------------------------------------------/
>> >
>> > Nan
>> >
>> >> Date: Wed, 12 Jun 2013 13:22:34 -0700
>> >> From: [email protected]
>> >> To: [email protected]
>> >> CC: [email protected]
>> >> Subject: Re: [FastBit-users] How to enable fastbit to answer the
>> > query without touching raw data
>> >>
>> >> When the cost of doing an estimation is high, the function
>> >> ibis::query::estimate gives back the default answer of 0 as min and N
>> >> as max (where N is the number of rows in the data partition).
>> >> Technically, this is a correct answer, thought it might not be what
>> >> you expect.
>> >>
>> >> In your case, since N = 100, which is very small, the overhead of
>> >> reading the index into memory or reading the header of the index into
>> >> memory is as high as reading all values (which takes up only 400
>> >> bytes) into memory. The decision of whether the cost of high not is
>> >> compared to reading the raw data values. I imagine that the cost
>> >> decision is likely to favor reading all N values.
>> >>
>> >> Depending on what you want to do with the results of
>> >> ibis::query::estimate, you might have to call ibis::query::evaluate
>> >> instead.
>> >>
>> >> John
>> >>
>> >>
>> >> On 6/12/13 9:17 AM, nan zhou wrote:
>> >> > Hi, John,
>> >> >
>> >> > Thanks for the reply. I did retrieve the min and the max hits.
> But the
>> >> > return values is *0* for the /getMinNumHits/ and *100* for
>> >> > the /getMaxNumHits/. 100 is the total number of records.
>> >> >
>> >> > I am expecting the returned min hits at least is *8* and max
> hits at
>> >> > least is *15* for the query which has where clause ( data < 15
> ) and
>> >> > data which has following distribution of each bin.
>> >> >
>> >> > Records distribution for each bin:
>> >> >> > value range | # of element locates in this range
>> >> >> > [0 - 10) | 8
>> >> >> > [10 - 20) | 7 // our query touches these two bins
>> >> >> > [20 - 30) | 12
>> >> >> > [30 - 40) | 11
>> >> >> > [40 - 50) | 10
>> >> >> > [50 - 60) | 9
>> >> >> > [60 - 70) | 15
>> >> >> > [70 - 80) | 10
>> >> >> > [80 - 90) | 7
>> >> >> > [90 - 100) | 11
>> >> >
>> >> >
>> >> > Please see below for the codes I am using:
>> >> >
>> >> > /estimate_query.setWhereClause ("data < 15");//
>> >> > //estimate_query.getHitRows (RIDs);//
>> >> > //
>> >> > //uint64_t min_hits = estimate_query.getMinNumHits ();//
>> >> > //uint64_t max_hits = estimate_query.getMaxNumHits ();//
>> >> > //uint32_t estimate_size = RIDs.size ();
>> >> >
>> >> > /Output:/
>> >> > //>>> where data < 15: estimate() *returned 0 records between
> minimum
>> >> > 0 and maximum 100 hits.*/
>> >> > /*
>> >> > */
>> >> > /*Thanks, */
>> >> > /*
>> >> > */
>> >> > /*Nan
>> >> > */
>> >> >> Date: Tue, 11 Jun 2013 23:05:09 -0700
>> >> >> From: [email protected]
>> >> >> To: [email protected]
>> >> >> CC: [email protected]
>> >> >> Subject: Re: [FastBit-users] How to enable fastbit to answer the
>> >> > query without touching raw data
>> >> >>
>> >> >> The documentation of ibis::query::estimate states that
>> >> >>
>> >> >> Returns 0 for success, a negative value for error.
>> >> >>
>> >> >> Since the function call was completed correctly, it should have
>> >> >> returned 0. To find out the minimum and maximum number of hits
>> >> >> determined by ibis::query::estimate, you need to call
>> >> >> ibis::query::getMinNumHits and ibis::query::getMaxNumHits. You can
>> >> >> see an example of how they are used in examples/ibis.cpp line
>> > 3549 and
>> >> >> 3550.
>> >> >>
>> >> >> John
>> >> >>
>> >> >>
>> >> >> On 6/11/13 2:50 PM, nan zhou wrote:
>> >> >> > Hello,
>> >> >> >
>> >> >> > Sorry to send this email again, I realized that the email is not
>> >> >> > sent to fastbit user mailing list. Following is my problem.
>> >> >> >
>> >> >> > I tried the estimate function as you instructed before,
> however I
>> >> >> > got a wrong answer from estimate function (FastBit version is
>> > 1.3.6).
>> >> >> > Could you help me ?
>> >> >> >
>> >> >> > I have data which has following distribution:
>> >> >> > value range | # of element locates in this range
>> >> >> > [0 - 10) | 8
>> >> >> > [10 - 20) | 7
>> >> >> > [20 - 30) | 12
>> >> >> > [30 - 40) | 11
>> >> >> > [40 - 50) | 10
>> >> >> > [50 - 60) | 9
>> >> >> > [60 - 70) | 15
>> >> >> > [70 - 80) | 10
>> >> >> > [80 - 90) | 7
>> >> >> > [90 - 100) | 11
>> >> >> > Above data was binned into 4 bins, whose boundaries are "10, 40,
>> >> >> > 70, 100".
>> >> >> >
>> >> >> > I applied estimate function when the query is " xxx where data
>> >> >> > value < 15 ", the estimate function return 0, which is not
> right.
>> >> >> > If i use evaluate function given by same query, the results
> number
>> >> >> > is 15 which is correct.
>> >> >> >
>> >> >> > Here is my code :
>> >> >> >
>> >> >> > vector <uint32_t> RIDs;
>> >> >> >
>> >> >> > ibis::part table ("test", static_cast<const char*>(0));
>> >> >> >
>> >> >> > // create a query object with the current user name.
>> >> >> > ibis::query estimate_query (ibis::util::userName(), &table);
>> >> >> > ibis::query evaluate_query (ibis::util::userName(), &table);
>> >> >> >
>> >> >> > evaluate_query.setWhereClause ("data < 15");
>> >> >> > assert (evaluate_query.evaluate () >= 0);
>> >> >> > evaluate_query.getHitRows (RIDs);
>> >> >> >
>> >> >> > uint32_t evaluate_size = RIDs.size ();
>> >> >> >
>> >> >> > cout << "number of records where data < 15: evaluate() = " <<
>> >> >> > evaluate_size << " records." << endl; *// here it returns 15*
>> >> >> >
>> >> >> > RIDs.clear ();
>> >> >> >
>> >> >> > estimate_query.setWhereClause ("data < 15");
>> >> >> > estimate_query.getHitRows (RIDs);
>> >> >> >
>> >> >> > uint64_t min_hits = estimate_query.getMinNumHits ();
>> >> >> > uint64_t max_hits = estimate_query.getMaxNumHits ();
>> >> >> > uint32_t estimate_size = RIDs.size ();
>> >> >> >
>> >> >> > cout << "number of records where data < 15: estimate() = " <<
>> >> >> > estimate_size << " records between " << min_hits << " and " <<
>> >> >> > max_hits << " hits." << endl; *// value of variable
> estimate_size
>> >> >> > is 0 , and min_hits = 0, and max_hits = 100*
>> >> >> >
>> >> >> > Any clue why it is not returning the right value? Thanks
>> >> >> >
>> >> >> > Nan
>> >> >> >
>> >> >> >
>> >> >> >
>> > ----------------------------------------------------------------------
>> >> >> > From: [email protected]
>> >> >> > To: [email protected]
>> >> >> > Subject: RE: [FastBit-users] How to enable fastbit to answer
>> > the query
>> >> >> > without touching raw data
>> >> >> > Date: Thu, 9 May 2013 22:35:58 +0800
>> >> >> >
>> >> >> > Thank you very much.
>> >> >> >
>> >> >> > nan
>> >> >> >
>> >> >> >> Date: Wed, 8 May 2013 14:52:31 -0700
>> >> >> >> From: [email protected]
>> >> >> >> To: [email protected]
>> >> >> >> CC: [email protected]
>> >> >> >> Subject: Re: [FastBit-users] How to enable fastbit to
> answer the
>> >> >> > query without touching raw data
>> >> >> >>
>> >> >> >> Yes, your understanding is correct.
>> >> >> >>
>> >> >> >> John
>> >> >> >>
>> >> >> >>
>> >> >> >> On 5/8/13 1:38 PM, nan zhou wrote:
>> >> >> >> > Hi, John,
>> >> >> >> >
>> >> >> >> > Further question would be how the `estimate` function
> works. For
>> >> >> >> > example, if I have bin boundaries, such as: 0, 10 , 20, 30,
>> > 40, and
>> >> >> >> > 50 , six bin boundaries for column A( bin 1: [0, 10), bin 2:
>> >> > [10, 20),
>> >> >> >> > bin 3: [20, 30), bin 4 [30, 40), bin 5 [40, 50) ) . The where
>> >> > clause
>> >> >> >> > has 21<= A <= 35. In such as, all bit positions/RIDs in bin 3
>> >> > and bin
>> >> >> >> > 4 are retrieved, no matter whether the actual value is in
>> > the query
>> >> >> >> > range or not. Do I understand it correctly?
>> >> >> >> >
>> >> >> >> > Thanks.
>> >> >> >> >
>> >> >> >> > nan
>> >> >> >> >
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to