Hi, John Much appreciate for your time and help, please forgive me if I ask too much. I expanded the total number of values to 20,000,000. And the estimate function still give min hits as 0, and max hits as 20, 000,000 by using same query ( details is at the end) . Could you explain a little bit on how the FastBit to do estimate, like how it calculate cost, how to decide to load the index or raw data? Maybe, I missed other important thing during index build. Thank you very much ./estimate_query number of records where data < 15: evaluate() = 3998670 records.number of records where data < 15: estimate() = 0 records between 0 and 20000000 hits. --------------------- Histogram ------------------------------------0 to 10 has 1999176 elements10 to 40 has 5999627 elements40 to 70 has 5999200 elements70 to 100 has 6001997 elements100 to 1.79769e+308 has 0 elements-------------------------------------------------------------------- Nan > Date: Wed, 12 Jun 2013 13:22:34 -0700 > From: [email protected] > To: [email protected] > CC: [email protected] > Subject: Re: [FastBit-users] How to enable fastbit to answer the query > without touching raw data > > When the cost of doing an estimation is high, the function > ibis::query::estimate gives back the default answer of 0 as min and N > as max (where N is the number of rows in the data partition). > Technically, this is a correct answer, thought it might not be what > you expect. > > In your case, since N = 100, which is very small, the overhead of > reading the index into memory or reading the header of the index into > memory is as high as reading all values (which takes up only 400 > bytes) into memory. The decision of whether the cost of high not is > compared to reading the raw data values. I imagine that the cost > decision is likely to favor reading all N values. > > Depending on what you want to do with the results of > ibis::query::estimate, you might have to call ibis::query::evaluate > instead. > > John > > > On 6/12/13 9:17 AM, nan zhou wrote: > > Hi, John, > > > > Thanks for the reply. I did retrieve the min and the max hits. But the > > return values is *0* for the /getMinNumHits/ and *100* for > > the /getMaxNumHits/. 100 is the total number of records. > > > > I am expecting the returned min hits at least is *8* and max hits at > > least is *15* for the query which has where clause ( data < 15 ) and > > data which has following distribution of each bin. > > > > Records distribution for each bin: > >> > value range | # of element locates in this range > >> > [0 - 10) | 8 > >> > [10 - 20) | 7 // our query touches these > >> > two bins > >> > [20 - 30) | 12 > >> > [30 - 40) | 11 > >> > [40 - 50) | 10 > >> > [50 - 60) | 9 > >> > [60 - 70) | 15 > >> > [70 - 80) | 10 > >> > [80 - 90) | 7 > >> > [90 - 100) | 11 > > > > > > Please see below for the codes I am using: > > > > /estimate_query.setWhereClause ("data < 15");// > > //estimate_query.getHitRows (RIDs);// > > // > > //uint64_t min_hits = estimate_query.getMinNumHits ();// > > //uint64_t max_hits = estimate_query.getMaxNumHits ();// > > //uint32_t estimate_size = RIDs.size (); > > > > /Output:/ > > //>>> where data < 15: estimate() *returned 0 records between minimum > > 0 and maximum 100 hits.*/ > > /* > > */ > > /*Thanks, */ > > /* > > */ > > /*Nan > > */ > >> Date: Tue, 11 Jun 2013 23:05:09 -0700 > >> From: [email protected] > >> To: [email protected] > >> CC: [email protected] > >> Subject: Re: [FastBit-users] How to enable fastbit to answer the > > query without touching raw data > >> > >> The documentation of ibis::query::estimate states that > >> > >> Returns 0 for success, a negative value for error. > >> > >> Since the function call was completed correctly, it should have > >> returned 0. To find out the minimum and maximum number of hits > >> determined by ibis::query::estimate, you need to call > >> ibis::query::getMinNumHits and ibis::query::getMaxNumHits. You can > >> see an example of how they are used in examples/ibis.cpp line 3549 and > >> 3550. > >> > >> John > >> > >> > >> On 6/11/13 2:50 PM, nan zhou wrote: > >> > Hello, > >> > > >> > Sorry to send this email again, I realized that the email is not > >> > sent to fastbit user mailing list. Following is my problem. > >> > > >> > I tried the estimate function as you instructed before, however I > >> > got a wrong answer from estimate function (FastBit version is 1.3.6). > >> > Could you help me ? > >> > > >> > I have data which has following distribution: > >> > value range | # of element locates in this range > >> > [0 - 10) | 8 > >> > [10 - 20) | 7 > >> > [20 - 30) | 12 > >> > [30 - 40) | 11 > >> > [40 - 50) | 10 > >> > [50 - 60) | 9 > >> > [60 - 70) | 15 > >> > [70 - 80) | 10 > >> > [80 - 90) | 7 > >> > [90 - 100) | 11 > >> > Above data was binned into 4 bins, whose boundaries are "10, 40, > >> > 70, 100". > >> > > >> > I applied estimate function when the query is " xxx where data > >> > value < 15 ", the estimate function return 0, which is not right. > >> > If i use evaluate function given by same query, the results number > >> > is 15 which is correct. > >> > > >> > Here is my code : > >> > > >> > vector <uint32_t> RIDs; > >> > > >> > ibis::part table ("test", static_cast<const char*>(0)); > >> > > >> > // create a query object with the current user name. > >> > ibis::query estimate_query (ibis::util::userName(), &table); > >> > ibis::query evaluate_query (ibis::util::userName(), &table); > >> > > >> > evaluate_query.setWhereClause ("data < 15"); > >> > assert (evaluate_query.evaluate () >= 0); > >> > evaluate_query.getHitRows (RIDs); > >> > > >> > uint32_t evaluate_size = RIDs.size (); > >> > > >> > cout << "number of records where data < 15: evaluate() = " << > >> > evaluate_size << " records." << endl; *// here it returns 15* > >> > > >> > RIDs.clear (); > >> > > >> > estimate_query.setWhereClause ("data < 15"); > >> > estimate_query.getHitRows (RIDs); > >> > > >> > uint64_t min_hits = estimate_query.getMinNumHits (); > >> > uint64_t max_hits = estimate_query.getMaxNumHits (); > >> > uint32_t estimate_size = RIDs.size (); > >> > > >> > cout << "number of records where data < 15: estimate() = " << > >> > estimate_size << " records between " << min_hits << " and " << > >> > max_hits << " hits." << endl; *// value of variable estimate_size > >> > is 0 , and min_hits = 0, and max_hits = 100* > >> > > >> > Any clue why it is not returning the right value? Thanks > >> > > >> > Nan > >> > > >> > > >> > ---------------------------------------------------------------------- > >> > From: [email protected] > >> > To: [email protected] > >> > Subject: RE: [FastBit-users] How to enable fastbit to answer the query > >> > without touching raw data > >> > Date: Thu, 9 May 2013 22:35:58 +0800 > >> > > >> > Thank you very much. > >> > > >> > nan > >> > > >> >> Date: Wed, 8 May 2013 14:52:31 -0700 > >> >> From: [email protected] > >> >> To: [email protected] > >> >> CC: [email protected] > >> >> Subject: Re: [FastBit-users] How to enable fastbit to answer the > >> > query without touching raw data > >> >> > >> >> Yes, your understanding is correct. > >> >> > >> >> John > >> >> > >> >> > >> >> On 5/8/13 1:38 PM, nan zhou wrote: > >> >> > Hi, John, > >> >> > > >> >> > Further question would be how the `estimate` function works. For > >> >> > example, if I have bin boundaries, such as: 0, 10 , 20, 30, 40, and > >> >> > 50 , six bin boundaries for column A( bin 1: [0, 10), bin 2: > > [10, 20), > >> >> > bin 3: [20, 30), bin 4 [30, 40), bin 5 [40, 50) ) . The where > > clause > >> >> > has 21<= A <= 35. In such as, all bit positions/RIDs in bin 3 > > and bin > >> >> > 4 are retrieved, no matter whether the actual value is in the query > >> >> > range or not. Do I understand it correctly? > >> >> > > >> >> > Thanks. > >> >> > > >> >> > nan > >> >> >
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
