Hi, John 
  Much appreciate for your time and help, please forgive me if I ask too much. 
  I expanded the total number of values to 20,000,000. And the estimate 
function still give min hits as 0, and max hits as 20, 000,000 by using same 
query ( details is at the end) .  Could you explain a little bit on how the 
FastBit to do estimate, like how it calculate cost, how to decide to load the 
index or raw data? Maybe, I missed other important thing during index build. 
  Thank you very much 
./estimate_query number of records where data < 15: evaluate() = 3998670 
records.number of records where data < 15: estimate() = 0 records between 0 and 
20000000 hits.
--------------------- Histogram ------------------------------------0 to 10 has 
1999176 elements10 to 40 has 5999627 elements40 to 70 has 5999200 elements70 to 
100 has 6001997 elements100 to 1.79769e+308 has 0 
elements--------------------------------------------------------------------
Nan
> Date: Wed, 12 Jun 2013 13:22:34 -0700
> From: [email protected]
> To: [email protected]
> CC: [email protected]
> Subject: Re: [FastBit-users] How to enable fastbit to answer the query 
> without touching raw data
> 
> When the cost of doing an estimation is high, the function
> ibis::query::estimate gives back the default answer of 0 as min and N
> as max (where N is the number of rows in the data partition).
> Technically, this is a correct answer, thought it might not be what
> you expect.
> 
> In your case, since N = 100, which is very small, the overhead of
> reading the index into memory or reading the header of the index into
> memory is as high as reading all values (which takes up only 400
> bytes) into memory.  The decision of whether the cost of high not is
> compared to reading the raw data values.  I imagine that the cost
> decision is likely to favor reading all N values.
> 
> Depending on what you want to do with the results of
> ibis::query::estimate, you might have to call ibis::query::evaluate
> instead.
> 
> John
> 
> 
> On 6/12/13 9:17 AM, nan zhou wrote:
> > Hi, John, 
> > 
> > Thanks for the reply. I did retrieve the min and the max hits. But the
> > return values is *0* for the /getMinNumHits/ and *100* for
> > the /getMaxNumHits/. 100 is the total number of records. 
> > 
> > I am expecting the returned min hits at least  is  *8* and max hits at
> > least is *15* for the query which has where clause ( data < 15 ) and
> > data which has following  distribution of each bin. 
> > 
> >       Records distribution for each bin: 
> >> > value range | # of element locates in this range 
> >> > [0 - 10) | 8
> >> > [10 - 20) | 7                                // our query touches these 
> >> > two bins
> >> > [20 - 30) | 12
> >> > [30 - 40) | 11
> >> > [40 - 50) | 10
> >> > [50 - 60) | 9
> >> > [60 - 70) | 15
> >> > [70 - 80) | 10
> >> > [80 - 90) | 7
> >> > [90 - 100) | 11
> > 
> > 
> > Please see below for the codes I am using:
> > 
> > /estimate_query.setWhereClause ("data < 15");//
> > //estimate_query.getHitRows (RIDs);//
> > //
> > //uint64_t min_hits = estimate_query.getMinNumHits ();//
> > //uint64_t max_hits = estimate_query.getMaxNumHits ();//
> > //uint32_t estimate_size = RIDs.size ();
> > 
> > /Output:/
> > //>>> where data < 15: estimate() *returned 0 records between minimum
> > 0 and maximum 100 hits.*/
> > /*
> > */
> > /*Thanks, */
> > /*
> > */
> > /*Nan
> > */
> >> Date: Tue, 11 Jun 2013 23:05:09 -0700
> >> From: [email protected]
> >> To: [email protected]
> >> CC: [email protected]
> >> Subject: Re: [FastBit-users] How to enable fastbit to answer the
> > query without touching raw data
> >>
> >> The documentation of ibis::query::estimate states that
> >>
> >> Returns 0 for success, a negative value for error.
> >>
> >> Since the function call was completed correctly, it should have
> >> returned 0. To find out the minimum and maximum number of hits
> >> determined by ibis::query::estimate, you need to call
> >> ibis::query::getMinNumHits and ibis::query::getMaxNumHits. You can
> >> see an example of how they are used in examples/ibis.cpp line 3549 and
> >> 3550.
> >>
> >> John
> >>
> >>
> >> On 6/11/13 2:50 PM, nan zhou wrote:
> >> > Hello,
> >> >
> >> > Sorry to send this email again, I realized that the email is not
> >> > sent to fastbit user mailing list. Following is my problem.
> >> >
> >> > I tried the estimate function as you instructed before, however I
> >> > got a wrong answer from estimate function (FastBit version is 1.3.6).
> >> > Could you help me ?
> >> >
> >> > I have data which has following distribution:
> >> > value range | # of element locates in this range
> >> > [0 - 10) | 8
> >> > [10 - 20) | 7
> >> > [20 - 30) | 12
> >> > [30 - 40) | 11
> >> > [40 - 50) | 10
> >> > [50 - 60) | 9
> >> > [60 - 70) | 15
> >> > [70 - 80) | 10
> >> > [80 - 90) | 7
> >> > [90 - 100) | 11
> >> > Above data was binned into 4 bins, whose boundaries are "10, 40,
> >> > 70, 100".
> >> >
> >> > I applied estimate function when the query is " xxx where data
> >> > value < 15 ", the estimate function return 0, which is not right.
> >> > If i use evaluate function given by same query, the results number
> >> > is 15 which is correct.
> >> >
> >> > Here is my code :
> >> >
> >> > vector <uint32_t> RIDs;
> >> >
> >> > ibis::part table ("test", static_cast<const char*>(0));
> >> >
> >> > // create a query object with the current user name.
> >> > ibis::query estimate_query (ibis::util::userName(), &table);
> >> > ibis::query evaluate_query (ibis::util::userName(), &table);
> >> >
> >> > evaluate_query.setWhereClause ("data < 15");
> >> > assert (evaluate_query.evaluate () >= 0);
> >> > evaluate_query.getHitRows (RIDs);
> >> >
> >> > uint32_t evaluate_size = RIDs.size ();
> >> >
> >> > cout << "number of records where data < 15: evaluate() = " <<
> >> > evaluate_size << " records." << endl; *// here it returns 15*
> >> >
> >> > RIDs.clear ();
> >> >
> >> > estimate_query.setWhereClause ("data < 15");
> >> > estimate_query.getHitRows (RIDs);
> >> >
> >> > uint64_t min_hits = estimate_query.getMinNumHits ();
> >> > uint64_t max_hits = estimate_query.getMaxNumHits ();
> >> > uint32_t estimate_size = RIDs.size ();
> >> >
> >> > cout << "number of records where data < 15: estimate() = " <<
> >> > estimate_size << " records between " << min_hits << " and " <<
> >> > max_hits << " hits." << endl; *// value of variable estimate_size
> >> > is 0 , and min_hits = 0, and max_hits = 100*
> >> >
> >> > Any clue why it is not returning the right value? Thanks
> >> >
> >> > Nan
> >> >
> >> >
> >> > ----------------------------------------------------------------------
> >> > From: [email protected]
> >> > To: [email protected]
> >> > Subject: RE: [FastBit-users] How to enable fastbit to answer the query
> >> > without touching raw data
> >> > Date: Thu, 9 May 2013 22:35:58 +0800
> >> >
> >> > Thank you very much.
> >> >
> >> > nan
> >> >
> >> >> Date: Wed, 8 May 2013 14:52:31 -0700
> >> >> From: [email protected]
> >> >> To: [email protected]
> >> >> CC: [email protected]
> >> >> Subject: Re: [FastBit-users] How to enable fastbit to answer the
> >> > query without touching raw data
> >> >>
> >> >> Yes, your understanding is correct.
> >> >>
> >> >> John
> >> >>
> >> >>
> >> >> On 5/8/13 1:38 PM, nan zhou wrote:
> >> >> > Hi, John,
> >> >> >
> >> >> > Further question would be how the `estimate` function works. For
> >> >> > example, if I have bin boundaries, such as: 0, 10 , 20, 30, 40, and
> >> >> > 50 , six bin boundaries for column A( bin 1: [0, 10), bin 2:
> > [10, 20),
> >> >> > bin 3: [20, 30), bin 4 [30, 40), bin 5 [40, 50) ) . The where
> > clause
> >> >> > has 21<= A <= 35. In such as, all bit positions/RIDs in bin 3
> > and bin
> >> >> > 4 are retrieved, no matter whether the actual value is in the query
> >> >> > range or not. Do I understand it correctly?
> >> >> >
> >> >> > Thanks.
> >> >> >
> >> >> > nan
> >> >> >
                                          
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to