Hello, 

   Sorry to send this email again, I realized that the email is not sent to 
fastbit user mailing list.  Following is my problem.   
  I tried the estimate function as you instructed before, however I got a wrong 
answer from estimate function (FastBit version is 1.3.6).     Could you  help 
me ? 
  I have data which has following distribution:     value range           |     
       # of element locates in this range    [0 - 10)                     |     
              8   [10 - 20)                   |                   7   [20 - 30) 
                  |                   12   [30 - 40)                   |        
           11   [40 - 50)                   |                   10   [50 - 60)  
                 |                   9   [60 - 70)                   |          
         15   [70 - 80)                   |                   10   [80 - 90)    
               |                   7   [90 - 100)                 |             
      11  Above data  was binned into 4 bins, whose boundaries are "10, 40, 70, 
100". 
  I applied estimate function when the query is " xxx  where data value < 15 ", 
the estimate function return 0, which is not right.   If i use evaluate 
function given by same query, the results number is 15 which is correct.
  Here is my code : 
   vector <uint32_t> RIDs;
    ibis::part table ("test", static_cast<const char*>(0));
    // create a query object with the current user name.    ibis::query 
estimate_query (ibis::util::userName(), &table);    ibis::query evaluate_query 
(ibis::util::userName(), &table);
    evaluate_query.setWhereClause ("data < 15");    assert 
(evaluate_query.evaluate () >= 0);     evaluate_query.getHitRows (RIDs);        
uint32_t evaluate_size = RIDs.size ();
    cout << "number of records where data < 15: evaluate() = " << evaluate_size 
<< " records." << endl; // here it returns 15
    RIDs.clear ();
    estimate_query.setWhereClause ("data < 15");    estimate_query.getHitRows 
(RIDs);        uint64_t min_hits = estimate_query.getMinNumHits ();    uint64_t 
max_hits = estimate_query.getMaxNumHits ();    uint32_t estimate_size = 
RIDs.size ();
    cout << "number of records where data < 15: estimate() = " << estimate_size 
<< " records between " << min_hits << " and " << max_hits << " hits." << endl;  
 // value of variable estimate_size  is 0 , and min_hits = 0, and max_hits = 100
  Any clue why it is not returning the right value?  Thanks
Nan

From: [email protected]
To: [email protected]
Subject: RE: [FastBit-users] How to enable fastbit to answer the query without 
touching raw data
Date: Thu, 9 May 2013 22:35:58 +0800




Thank you very much.
nan

> Date: Wed, 8 May 2013 14:52:31 -0700
> From: [email protected]
> To: [email protected]
> CC: [email protected]
> Subject: Re: [FastBit-users] How to enable fastbit to answer the query 
> without touching raw data
> 
> Yes, your understanding is correct.
> 
> John
> 
> 
> On 5/8/13 1:38 PM, nan zhou wrote:
> > Hi, John, 
> > 
> >   Further question would be how the `estimate` function works. For
> > example, if I have bin boundaries, such as: 0,  10 , 20, 30, 40, and
> > 50 , six bin boundaries for column A( bin 1: [0, 10), bin 2: [10, 20),
> > bin 3: [20, 30), bin 4 [30, 40), bin 5 [40, 50) ) . The where clause
> > has 21<= A <= 35.  In such as, all bit positions/RIDs in bin 3 and bin
> > 4 are retrieved, no matter whether the actual value is in the query
> > range or not. Do I understand it correctly? 
> > 
> >   Thanks. 
> > 
> > nan
> > 
                                                                                
                                          
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to