Hi, Steven,

Thanks for the detailed report.  There are a couple of similar places
with the same problem.  A slightly different fix has been used in all
of them in SVN revision 633.  When you get the chance, please give it
a try.

Presumably, you would want to have a tar ball release to use.  If this
is the case, let me know, I will make a new tar ball.

Thanks again.

John


On 5/26/13 6:57 PM, Enns, Steven wrote:
> Hi everyone,
> 
> I believe I've found a bug in countQuery::doEvaluate which is
> producing incorrect query results.  We first noticed that the
> following queries incorrectly returned the same results using
> ibis::table::select, but distinct and seemingly correct values using
> ibis::query::getHitRows:
> 
> (behaviorSegments contains '63' OR behaviorSegments contains '662')
> AND (nielsenDma contains '503')
> (behaviorSegments contains '63' OR behaviorSegments contains '662')
> AND (nielsenDma contains '501')
> (behaviorSegments contains '63' OR behaviorSegments contains '662')
> AND (nielsenDma contains 'not a keyword!!')
> 
> I enabled verbosity level 4 using the ibis executable to find that
> value of ierr after evaluating the left hand side of the AND clause
> was 0 (indicating 0 hit rows), causing evaluation to stop without
> evaluating the right hand side.
> 
> Can I suggest the following fix for countQuery.cpp?  We are still
> using 1.3.5.  Is there a formal channel for filing bugs such as jira?
> 
> 898   
> 
>     case ibis::qExpr::LOGICAL_OR: {
> 
> 899   
> 
>       ierr = doEvaluate(term->getLeft(), mask, ht);
> 
> 900   
> 
>       if (ierr >= 0 && ht.cnt() < mask.cnt()) {
> 
> 901   
> 
>             int leftIerr = ierr;
> 
> 902   
> 
>           ibis::bitvector b1;
> 
> 903   
> 
>           if (ht.cnt() > mask.bytes() + ht.bytes()) {
> 
> 904   
> 
>               ibis::bitvector* newmask = mask - ht;
> 
> 905   
> 
>               ierr = doEvaluate(term->getRight(), *newmask, b1);
> 
> 906   
> 
>               delete newmask;
> 
> 907   
> 
>           }
> 
> 908   
> 
>           else {
> 
> 909   
> 
>               ierr = doEvaluate(term->getRight(), mask, b1);
> 
> 910   
> 
>           }
> 
> 911   
> 
>           if (ierr > 0) {
> 
> 912   
> 
>               ht |= b1;
> 
> 913   
> 
>               ierr = ht.sloppyCount();
> 
> 914   
> 
>           } else {
> 
> 915   
> 
>                 ierr = leftIerr;
> 
> 916   
> 
>             }
> 
> 917   
> 
>       }
> 
> 918   
> 
>       break;}
> 
> 
> ibis > where (behaviorSegments contains '63' OR behaviorSegments
> contains '662') AND (nielsenDma contains '503')
> doQuery -- processing " FROM 8r7nhJy0RL WHERE (behaviorSegments
> contains '63' OR behaviorSegments contains '662') AND (nielsenDma
> contains '503')"
> Constructing selectClause @ 0x7fff15ffa9c8
> newToken -- generated new token "s315Pp5XgCB----2" for user saenns
> query[s315Pp5XgCB----2]::setWhereClause -- add a new where clause
> "(behaviorSegments contains '63' OR behaviorSegments contains '662')
> AND (nielsenDma contains '503')".
> query[s315Pp5XgCB----2]::setWhereClause -- where "(behaviorSegments
> contains '63' OR behaviorSegments contains '662') AND (nielsenDma
> contains '503')"
> Translated the WHERE clause into: ((behaviorSegments CONTAINS '63' OR
> behaviorSegments CONTAINS '662') AND nielsenDma CONTAINS '503')
> query[s315Pp5XgCB----2]::evaluate -- starting to evaluate the query
> for user "saenns"
> query[s315Pp5XgCB----2]::doEvaluate(0x43d0f00: behaviorSegments
> CONTAINS '63', mask.cnt()=7795176) --> 209101, ierr = 209101
> query[s315Pp5XgCB----2]::doEvaluate(0x43d7320: behaviorSegments
> CONTAINS '662', mask.cnt()=7795176) --> 0, ierr = 0
> query[s315Pp5XgCB----2]::doEvaluate(0x43c8b20: (0x43d0f00 OR
> 0x43d7320), mask.cnt()=7795176) --> 209101, ierr = 209101
> query[s315Pp5XgCB----2]::doEvaluate(0x43d11a0: nielsenDma CONTAINS
> '503', mask.cnt()=209101) --> 126, ierr = 2
> query[s315Pp5XgCB----2]::doEvaluate(0x43cc620: (0x43c8b20 AND
> 0x43d11a0), mask.cnt()=7795176) --> 126, ierr = 2
> query[s315Pp5XgCB----2]::evaluate -- the hit contains xxx bits with
> 126 bits set(=1) taking up xxx bytes; the estimated clustering factor
> is 1; had the bits been randomly spread out, the expected size would
> be xxx bytes; estimated number of bytes to be read in order to access
> 4-byte values is xxx
> query[s315Pp5XgCB----2]::evaluate -- time to compute the 126 hits: 0
> sec(CPU), 0.19129 sec(elapsed).
> query[s315Pp5XgCB----2]::evaluate -- user saenns FROM 8r7nhJy0RL WHERE
> (behaviorSegments contains '63' OR behaviorSegments contains '662')
> AND (nielsenDma contains '503') ==> 126 hits.
> doQuery:: evaluate( FROM 8r7nhJy0RL WHERE (behaviorSegments contains
> '63' OR behaviorSegments contains '662') AND (nielsenDma contains
> '503')) produced 126 hits, took 0 CPU seconds, 0.340331 elapsed seconds
> countQuery::setWhereClause -- add a new where clause "(0x43d71e0 AND
> 0x43d7180)"
> countQuery::evaluate -- start timer ...
> countQuery::doEvaluate(0x43d7210: behaviorSegments CONTAINS '63',
> mask.cnt()=7795176) --> 209101, ierr = 209101
> countQuery::doEvaluate(0x43d7290: behaviorSegments CONTAINS '662',
> mask.cnt()=7795176) --> 0, ierr = 0
> countQuery::doEvaluate(0x43d71e0: (0x43d7210 OR 0x43d7290),
> mask.cnt()=7795176) --> 209101, ierr = 0
> countQuery::doEvaluate(0x43d91f0: (0x43d71e0 AND 0x43d7180),
> mask.cnt()=7795176) --> 209101, ierr = 0
> countQuery::evaluate -- Select count(*) From 8r7nhJy0RL Where
> (0x43d71e0 AND 0x43d7180) --> 209101
> countQuery::evaluate -- duration: 0 sec(CPU), 0.002358 sec(elapsed)
> Warning -- countQuery.getNumHits returned 209101, while
> query.getNumHits returned 126
> Freeing selectClause @ 0x7fff15ffa9c8
> 
> 
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to