Hi, Nan, Oh, well, I am not as sure about what is going on as I was earlier today. If you are willing to pack up the smaller test problem you were using, I could find time to look into a little bit more..
John On 6/12/13 7:36 PM, nan zhou wrote: > Hi, John > > Much appreciate for your time and help, please forgive me if I ask > too much. > > I expanded the total number of values to 20,000,000. And the > estimate function still give min hits as 0, and max hits as 20, > 000,000 by using same query ( details is at the end) . Could you > explain a little bit on how the FastBit to do estimate, like how it > calculate cost, how to decide to load the index or raw data? Maybe, I > missed other important thing during index build. > > Thank you very much > > /./estimate_query / > /number of records where data < 15: evaluate() = *3998670* records./ > /number of records where data < 15: estimate() = *0* records between 0 > and *20000000* hits./ > / > / > /--------------------- Histogram ------------------------------------/ > /0 to 10 has 1999176 elements/ > /10 to 40 has 5999627 elements/ > /40 to 70 has 5999200 elements/ > /70 to 100 has 6001997 elements/ > /100 to 1.79769e+308 has 0 elements/ > /--------------------------------------------------------------------/ > > Nan > >> Date: Wed, 12 Jun 2013 13:22:34 -0700 >> From: [email protected] >> To: [email protected] >> CC: [email protected] >> Subject: Re: [FastBit-users] How to enable fastbit to answer the > query without touching raw data >> >> When the cost of doing an estimation is high, the function >> ibis::query::estimate gives back the default answer of 0 as min and N >> as max (where N is the number of rows in the data partition). >> Technically, this is a correct answer, thought it might not be what >> you expect. >> >> In your case, since N = 100, which is very small, the overhead of >> reading the index into memory or reading the header of the index into >> memory is as high as reading all values (which takes up only 400 >> bytes) into memory. The decision of whether the cost of high not is >> compared to reading the raw data values. I imagine that the cost >> decision is likely to favor reading all N values. >> >> Depending on what you want to do with the results of >> ibis::query::estimate, you might have to call ibis::query::evaluate >> instead. >> >> John >> >> >> On 6/12/13 9:17 AM, nan zhou wrote: >> > Hi, John, >> > >> > Thanks for the reply. I did retrieve the min and the max hits. But the >> > return values is *0* for the /getMinNumHits/ and *100* for >> > the /getMaxNumHits/. 100 is the total number of records. >> > >> > I am expecting the returned min hits at least is *8* and max hits at >> > least is *15* for the query which has where clause ( data < 15 ) and >> > data which has following distribution of each bin. >> > >> > Records distribution for each bin: >> >> > value range | # of element locates in this range >> >> > [0 - 10) | 8 >> >> > [10 - 20) | 7 // our query touches these two bins >> >> > [20 - 30) | 12 >> >> > [30 - 40) | 11 >> >> > [40 - 50) | 10 >> >> > [50 - 60) | 9 >> >> > [60 - 70) | 15 >> >> > [70 - 80) | 10 >> >> > [80 - 90) | 7 >> >> > [90 - 100) | 11 >> > >> > >> > Please see below for the codes I am using: >> > >> > /estimate_query.setWhereClause ("data < 15");// >> > //estimate_query.getHitRows (RIDs);// >> > // >> > //uint64_t min_hits = estimate_query.getMinNumHits ();// >> > //uint64_t max_hits = estimate_query.getMaxNumHits ();// >> > //uint32_t estimate_size = RIDs.size (); >> > >> > /Output:/ >> > //>>> where data < 15: estimate() *returned 0 records between minimum >> > 0 and maximum 100 hits.*/ >> > /* >> > */ >> > /*Thanks, */ >> > /* >> > */ >> > /*Nan >> > */ >> >> Date: Tue, 11 Jun 2013 23:05:09 -0700 >> >> From: [email protected] >> >> To: [email protected] >> >> CC: [email protected] >> >> Subject: Re: [FastBit-users] How to enable fastbit to answer the >> > query without touching raw data >> >> >> >> The documentation of ibis::query::estimate states that >> >> >> >> Returns 0 for success, a negative value for error. >> >> >> >> Since the function call was completed correctly, it should have >> >> returned 0. To find out the minimum and maximum number of hits >> >> determined by ibis::query::estimate, you need to call >> >> ibis::query::getMinNumHits and ibis::query::getMaxNumHits. You can >> >> see an example of how they are used in examples/ibis.cpp line > 3549 and >> >> 3550. >> >> >> >> John >> >> >> >> >> >> On 6/11/13 2:50 PM, nan zhou wrote: >> >> > Hello, >> >> > >> >> > Sorry to send this email again, I realized that the email is not >> >> > sent to fastbit user mailing list. Following is my problem. >> >> > >> >> > I tried the estimate function as you instructed before, however I >> >> > got a wrong answer from estimate function (FastBit version is > 1.3.6). >> >> > Could you help me ? >> >> > >> >> > I have data which has following distribution: >> >> > value range | # of element locates in this range >> >> > [0 - 10) | 8 >> >> > [10 - 20) | 7 >> >> > [20 - 30) | 12 >> >> > [30 - 40) | 11 >> >> > [40 - 50) | 10 >> >> > [50 - 60) | 9 >> >> > [60 - 70) | 15 >> >> > [70 - 80) | 10 >> >> > [80 - 90) | 7 >> >> > [90 - 100) | 11 >> >> > Above data was binned into 4 bins, whose boundaries are "10, 40, >> >> > 70, 100". >> >> > >> >> > I applied estimate function when the query is " xxx where data >> >> > value < 15 ", the estimate function return 0, which is not right. >> >> > If i use evaluate function given by same query, the results number >> >> > is 15 which is correct. >> >> > >> >> > Here is my code : >> >> > >> >> > vector <uint32_t> RIDs; >> >> > >> >> > ibis::part table ("test", static_cast<const char*>(0)); >> >> > >> >> > // create a query object with the current user name. >> >> > ibis::query estimate_query (ibis::util::userName(), &table); >> >> > ibis::query evaluate_query (ibis::util::userName(), &table); >> >> > >> >> > evaluate_query.setWhereClause ("data < 15"); >> >> > assert (evaluate_query.evaluate () >= 0); >> >> > evaluate_query.getHitRows (RIDs); >> >> > >> >> > uint32_t evaluate_size = RIDs.size (); >> >> > >> >> > cout << "number of records where data < 15: evaluate() = " << >> >> > evaluate_size << " records." << endl; *// here it returns 15* >> >> > >> >> > RIDs.clear (); >> >> > >> >> > estimate_query.setWhereClause ("data < 15"); >> >> > estimate_query.getHitRows (RIDs); >> >> > >> >> > uint64_t min_hits = estimate_query.getMinNumHits (); >> >> > uint64_t max_hits = estimate_query.getMaxNumHits (); >> >> > uint32_t estimate_size = RIDs.size (); >> >> > >> >> > cout << "number of records where data < 15: estimate() = " << >> >> > estimate_size << " records between " << min_hits << " and " << >> >> > max_hits << " hits." << endl; *// value of variable estimate_size >> >> > is 0 , and min_hits = 0, and max_hits = 100* >> >> > >> >> > Any clue why it is not returning the right value? Thanks >> >> > >> >> > Nan >> >> > >> >> > >> >> > > ---------------------------------------------------------------------- >> >> > From: [email protected] >> >> > To: [email protected] >> >> > Subject: RE: [FastBit-users] How to enable fastbit to answer > the query >> >> > without touching raw data >> >> > Date: Thu, 9 May 2013 22:35:58 +0800 >> >> > >> >> > Thank you very much. >> >> > >> >> > nan >> >> > >> >> >> Date: Wed, 8 May 2013 14:52:31 -0700 >> >> >> From: [email protected] >> >> >> To: [email protected] >> >> >> CC: [email protected] >> >> >> Subject: Re: [FastBit-users] How to enable fastbit to answer the >> >> > query without touching raw data >> >> >> >> >> >> Yes, your understanding is correct. >> >> >> >> >> >> John >> >> >> >> >> >> >> >> >> On 5/8/13 1:38 PM, nan zhou wrote: >> >> >> > Hi, John, >> >> >> > >> >> >> > Further question would be how the `estimate` function works. For >> >> >> > example, if I have bin boundaries, such as: 0, 10 , 20, 30, > 40, and >> >> >> > 50 , six bin boundaries for column A( bin 1: [0, 10), bin 2: >> > [10, 20), >> >> >> > bin 3: [20, 30), bin 4 [30, 40), bin 5 [40, 50) ) . The where >> > clause >> >> >> > has 21<= A <= 35. In such as, all bit positions/RIDs in bin 3 >> > and bin >> >> >> > 4 are retrieved, no matter whether the actual value is in > the query >> >> >> > range or not. Do I understand it correctly? >> >> >> > >> >> >> > Thanks. >> >> >> > >> >> >> > nan >> >> >> > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
