Hi, Preeti, The bitmap indexes created are stored into files alongside the data files.
In your specific example, select A,B,C where D=10, only column D will be indexed (if it has not been indexed before). Typically, we recommend experienced users to explicit build the indexes (through ibis -b ...) so that there is more control over what indexes are created. John On 11/29/12 6:22 AM, preeti gupta wrote: > hmm ya .. does it store it in a bitmap after loading? > > and if my query is say > > select A,B,C where D =10 does it create indexes on all of them at the > time of execution if I have not created already? > > On Wed, Nov 28, 2012 at 4:29 PM, K. John Wu <[email protected] > <mailto:[email protected]>> wrote: > > The load step basically need to digest the CSV file and write the > values out to disk. This should be reasonably fast. Say, the CSV > file is 100 MB, reading and digesting the CSV file might take a few > seconds and writing might proceed at 100MB /s on a single disk. > Overall, you might expect to process 100 MB in a few seconds. > > Database system will take a lot longer to load 100 MB because they do > a lot of things to reorganize the data, arrange the pages and register > the metadata. > > John > > > On 11/28/12 3:46 PM, preeti gupta wrote: > > hmm great but how the loads are so fast. I mean the initially > when you > > load the data from the file > > > > On Wed, Nov 28, 2012 at 3:45 PM, K. John Wu <[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > When you run './ibis -d s0 -v -q "select avg(charge) where > zdc>90"', > > you are going to use an index on zdc to resolve the > consition "zdc > > > 90". This is the case where an index is needed or desired > and FastBit > > will automatically create the index. You might notice that > the query > > is little slower on the first run - because it has to spent > the time > > to create the index. The later runs of the same query would be > > faster. > > > > On 11/28/12 3:36 PM, preeti gupta wrote: > > > oh.. I am surprised > > > > > > so since I ran queries like this > > > > > > time ./ibis -d s0 -v -q "select avg(charge) where zdc>90" > > > > > > so I did not use indexes at all.. then how come it is so > fast. I got > > > really huge loaded and searched in less than a second. > > > > > > On Wed, Nov 28, 2012 at 3:06 PM, K. John Wu <[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>> > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>>> wrote: > > > > > > Hi, Preeti, > > > > > > The indexes are not created until needed or explicitly > > constructed. > > > When you load the data using ardea, no index is built > unless you > > > actually run some queries on the new data. To build > > indexes, use ibis > > > with option -b. You can find a little more about the > > command line > > > options for ibis at > > > <http://lbl.gov/~kwu/fastbit/doc/ibisCommandLine.html > <http://lbl.gov/%7Ekwu/fastbit/doc/ibisCommandLine.html> > > <http://lbl.gov/%7Ekwu/fastbit/doc/ibisCommandLine.html> > > > <http://lbl.gov/%7Ekwu/fastbit/doc/ibisCommandLine.html>>. > > > > > > Each FastBit index type has its own binary format for > > storing its > > > data. In all cases, the index file produced contains the > > keyvalues, > > > the bitmaps and some necessary pointers. You might > want to > > read the > > > documentation inside the source code to get more details. > > For example > > > if you want to see the generic version of the binned index > > read the > > > function ibis::bin::write in file src/ibin.cpp. > > > > > > John > > > > > > > > > On 11/28/12 2:14 PM, preeti gupta wrote: > > > > Hey John, > > > > > > > > Thanks. I will try that. Meanwhile I had another > question > > about > > > > indexes fastbit creates ( when I load the data does > that mean > > > indexes > > > > are created automatically) and what is the storage > system > > it uses to > > > > store bitmap indexes? > > > > > > > > > > > > > > > > On Wed, Nov 28, 2012 at 1:48 PM, K. John Wu > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>> > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> > > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>>>> wrote: > > > > > > > > Hi, Preeti, > > > > > > > > Located the problem to be a use of /dev/random for > > > initializing a > > > > random number generator at the start of the program. > > > Because the > > > > initialization was for a static variable, the > delay was > > > outside of > > > > main, which causes the apparent discrepancy > between the > > > timer values > > > > inside and outside. If you are interested in > try out > > the latest > > > > source code, you may download it (revision 605) > from SVN > > > repository > > > > > > > > svn checkout > https://codeforge.lbl.gov/anonscm/fastbit > > > > > > > > Feel free to let us know if you encounter any other > > problems. > > > > > > > > John > > > > > > > > > > > > > > > > > > > > On 11/19/12 2:33 PM, preeti gupta wrote: > > > > > HI John, > > > > > > > > > > I have program and data both in my local > directory. > > > > > > > > > > Also I tried building static library. but it is > > still taking > > > > more time > > > > > alternatively > > > > > > > > > > [preetigupta25@dhalsim examples]$ time ./ibis > -d s0 > > -v -q > > > "select > > > > > avg(charge) where zdc>90" > > > > > > > > > > Constructed a part named s0 > > > > > filter::sift2(SELECT avg(charge) FROM 1 data > partition > > > WHERE 90 < z > > > > > ...) -- processing data partition s0 > > > > > countQuery::evaluate -- Select count(*) From s0 > > Where 90 < zdc > > > > --> 11072 > > > > > countQuery::evaluate -- duration: 0.008998 > sec(CPU), > > 0.0116644 > > > > > sec(elapsed) > > > > > tableSelect -- select(avg(charge), zdc>90) on > table T-s0 > > > produced a > > > > > table with 1 row and 1 column > > > > > tableSelect -- the result table (1 x 1) for > "SELECT > > > avg(charge) FROM > > > > > T-s0 WHERE zdc>90" > > > > > 17.1549855491329 > > > > > > > > > > tableSelect:: complete evaluation of SELECT > avg(charge) > > > FROM T-s0 > > > > > WHERE zdc>90 took 0.013998 CPU seconds, 0.0502191 > > elapsed > > > seconds > > > > > > > > > > > /home/preetigupta25/FastBit/fastbit-ibis1.3.3/examples/.libs/lt-ibis > > > > > -- total CPU time 0.014998 s, total elapsed time > > 0.0509388 s > > > > > > > > > > real 0m0.071s > > > > > user 0m0.013s > > > > > sys 0m0.019s > > > > > [preetigupta25@dhalsim examples]$ time ./ibis > -d s0 > > -v -q > > > "select > > > > > avg(charge) where zdc>90" > > > > > > > > > > Constructed a part named s0 > > > > > filter::sift2(SELECT avg(charge) FROM 1 data > partition > > > WHERE 90 < z > > > > > ...) -- processing data partition s0 > > > > > countQuery::evaluate -- Select count(*) From s0 > > Where 90 < zdc > > > > --> 11072 > > > > > countQuery::evaluate -- duration: 0.012999 > sec(CPU), > > 0.0130408 > > > > > sec(elapsed) > > > > > tableSelect -- select(avg(charge), zdc>90) on > table T-s0 > > > produced a > > > > > table with 1 row and 1 column > > > > > tableSelect -- the result table (1 x 1) for > "SELECT > > > avg(charge) FROM > > > > > T-s0 WHERE zdc>90" > > > > > 17.1549855491329 > > > > > > > > > > tableSelect:: complete evaluation of SELECT > avg(charge) > > > FROM T-s0 > > > > > WHERE zdc>90 took 0.017998 CPU seconds, 0.0177805 > > elapsed > > > seconds > > > > > > > > > > > /home/preetigupta25/FastBit/fastbit-ibis1.3.3/examples/.libs/lt-ibis > > > > > -- total CPU time 0.017998 s, total elapsed time > > 0.0185473 s > > > > > > > > > > real 2m46.445s > > > > > user 0m0.018s > > > > > sys 0m0.014s > > > > > > > > > > > > > > > On Mon, Nov 19, 2012 at 8:13 AM, K. John Wu > > <[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>> > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> > > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>>> > > > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>> > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>>>>> wrote: > > > > > > > > > > Hi, Preeti, > > > > > > > > > > When you see a huge difference between the sum > > of user > > > time > > > > and sys > > > > > time, and the real time, it usually means that > > your job is > > > > blocked for > > > > > something. For example, your file system > might > > be busy. > > > > > > > > > > One thing you can try might be to move the > > program and the > > > > data to a > > > > > directory local to your test machine. > Another thing > > > to do is to > > > > > compile FastBit code with static option - > it will > > > produce larger > > > > > executables, but will remove the need to load > > libraries > > > > dynamically at > > > > > runtime. Please give these options a try. > > > > > > > > > > If anyone else has experience dealing with > this > > sort of > > > > performance > > > > > fluctuations, please let us know how you have > > solved the > > > > problem. > > > > > > > > > > John > > > > > > > > > > > > > > > On 11/19/12 8:05 AM, preeti gupta wrote: > > > > > > Hey John, > > > > > > > > > > > > I ran the command and the output is here > > > > > > > > > > > > [preetigupta25@dhalsim examples]$ time > ./ibis > > -d s0 > > > -v -q > > > > "select > > > > > > avg(charge) wh > > > > > > ere zdc>90" > > > > > > > > > > > > Constructed a part named s0 > > > > > > filter::sift2(SELECT avg(charge) FROM 1 data > > partition > > > > WHERE 90 < z > > > > > > ...) -- proc > > > > > > essing data partition s0 > > > > > > countQuery::evaluate -- Select count(*) > From s0 > > > Where 90 < zdc > > > > > --> 5536 > > > > > > countQuery::evaluate -- duration: 0.002 > sec(CPU), > > > 0.00106263 > > > > > sec(elapsed) > > > > > > tableSelect -- select(avg(charge), > zdc>90) on > > table T-s0 > > > > produced a > > > > > > table with 1 > > > > > > row and 1 column > > > > > > tableSelect -- the result table (1 x 1) for > > "SELECT > > > > avg(charge) FROM > > > > > > T-s0 WHERE > > > > > > zdc>90" > > > > > > 17.1549855491329 > > > > > > > > > > > > tableSelect:: complete evaluation of SELECT > > avg(charge) > > > > FROM T-s0 > > > > > > WHERE zdc>90 t > > > > > > ook 0.004 CPU seconds, > > > 0.00380039 elapsed > > > > > seconds > > > > > > > > > > > > > > > > /home/preetigupta25/FastBit/fastbit-ibis1.3.3/examples/.libs/lt-ibis > > > > > > -- total CP > > > > > > U time 0.005 s, total > > elapsed time > > > > 0.00450349 s > > > > > > > > > > > > real 0m0.023s > > > > > > user 0m0.012s > > > > > > sys 0m0.007s > > > > > > [preetigupta25@dhalsim examples]$ time > ./ibis > > -d s0 > > > -v -q > > > > "select > > > > > > avg(charge) where zdc>90" > > > > > > > > > > > > > > > > > > Constructed a part named s0 > > > > > > filter::sift2(SELECT avg(charge) FROM 1 data > > partition > > > > WHERE 90 < z > > > > > > ...) -- processing data partition s0 > > > > > > countQuery::evaluate -- Select count(*) > From s0 > > > Where 90 < zdc > > > > > --> 5536 > > > > > > countQuery::evaluate -- duration: 0.001 > sec(CPU), > > > 0.0011282 > > > > > sec(elapsed) > > > > > > tableSelect -- select(avg(charge), > zdc>90) on > > table T-s0 > > > > produced a > > > > > > table with 1 row and 1 column > > > > > > tableSelect -- the result table (1 x 1) for > > "SELECT > > > > avg(charge) FROM > > > > > > T-s0 WHERE zdc>90" > > > > > > 17.1549855491329 > > > > > > > > > > > > tableSelect:: complete evaluation of SELECT > > avg(charge) > > > > FROM T-s0 > > > > > > WHERE zdc>90 took 0.003999 CPU seconds, > 0.00399041 > > > elapsed > > > > seconds > > > > > > > > > > > > > > > > /home/preetigupta25/FastBit/fastbit-ibis1.3.3/examples/.libs/lt-ibis > > > > > > -- total CPU time 0.004999 s, total > elapsed time > > > 0.00474072 s > > > > > > > > > > > > real 1m2.509s > > > > > > user 0m0.006s > > > > > > sys 0m0.013s > > > > > > > > > > > > > > > > > > On Sun, Nov 18, 2012 at 11:19 PM, K. John Wu > > > <[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> > > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>>> > > > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>> > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>>>> > > > > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>> > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>>> > > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>> <mailto:[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>> > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>>>>>> wrote: > > > > > > > > > > > > Hi, Preeti, > > > > > > > > > > > > Would you mind to run > > > > > > > > > > > > time ./ibis -d s0 -v -q "select > avg(charge) > > > where zdc>90" > > > > > > > > > > > > and see what is reports? > > > > > > > > > > > > Which one of the STAR sample data > you are > > using? > > > > > > > > > > > > John > > > > > > > > > > > > > > > > > > On 11/18/12 3:00 PM, preeti gupta wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I am trying to setup Fastbit on > linux. I > > > have been > > > > able to > > > > > > load star > > > > > > > sample data file successfully. > > > > > > > > > > > > > > My query runs slower first time, and > > then faster > > > > next time > > > > > and then > > > > > > > faster (sometimes) one more time > but it > > really > > > slows > > > > down > > > > > next time. > > > > > > > Though the query output shows CPU > time taken > > > is .004 > > > > secs > > > > > but the > > > > > > > query actually returned in 125 secs. > > > > > > > It happens alternatively. > > > > > > > > > > > > > > [preetigupta25@dhalsim examples]$ > > ./fastbit.sh > > > > > > > > > > > > > > Constructed a part named s0 > > > > > > > filter::sift2(SELECT avg(charge) > FROM 1 data > > > partition > > > > > WHERE 90 < z > > > > > > > ...) -- processing data partition s0 > > > > > > > countQuery::evaluate -- Select > count(*) > > From s0 > > > > Where 90 < zdc > > > > > > --> 5536 > > > > > > > countQuery::evaluate -- duration: > 0.001 > > sec(CPU), > > > > 0.00105953 > > > > > > sec(elapsed) > > > > > > > tableSelect -- select(avg(charge), > > zdc>90) on > > > table T-s0 > > > > > produced a > > > > > > > table with 1 row and 1 column > > > > > > > tableSelect -- the result table (1 > x 1) for > > > "SELECT > > > > > avg(charge) FROM > > > > > > > T-s0 WHERE zdc>90" > > > > > > > 17.1549855491329 > > > > > > > > > > > > > > tableSelect:: complete evaluation > of SELECT > > > avg(charge) > > > > > FROM T-s0 > > > > > > > WHERE zdc>90 took 0.003 CPU seconds, > > > 0.00379491 elapsed > > > > > seconds > > > > > > > > > > > > > > > > > > > > > > /home/preetigupta25/FastBit/fastbit-ibis1.3.3/examples/.libs/lt-ibis > > > > > > > -- total CPU time 0.004 s, total > elapsed > > time > > > > 0.00450182 s > > > > > > > It took 0 seconds > > > > > > > [preetigupta25@dhalsim examples]$ > > ./fastbit.sh > > > > > > > > > > > > > > Constructed a part named s0 > > > > > > > filter::sift2(SELECT avg(charge) > FROM 1 data > > > partition > > > > > WHERE 90 < z > > > > > > > ...) -- processing data partition s0 > > > > > > > countQuery::evaluate -- Select > count(*) > > From s0 > > > > Where 90 < zdc > > > > > > --> 5536 > > > > > > > countQuery::evaluate -- duration: > 0.001 > > sec(CPU), > > > > 0.00106502 > > > > > > sec(elapsed) > > > > > > > tableSelect -- select(avg(charge), > > zdc>90) on > > > table T-s0 > > > > > produced a > > > > > > > table with 1 row and 1 column > > > > > > > tableSelect -- the result table (1 > x 1) for > > > "SELECT > > > > > avg(charge) FROM > > > > > > > T-s0 WHERE zdc>90" > > > > > > > 17.1549855491329 > > > > > > > > > > > > > > tableSelect:: complete evaluation > of SELECT > > > avg(charge) > > > > > FROM T-s0 > > > > > > > WHERE zdc>90 took 0.004 CPU seconds, > > > 0.00384116 elapsed > > > > > seconds > > > > > > > > > > > > > > > > > > > > > > /home/preetigupta25/FastBit/fastbit-ibis1.3.3/examples/.libs/lt-ibis > > > > > > > -- total CPU time 0.005 s, total > elapsed > > time > > > > 0.0045433 s > > > > > > > It took 125 seconds > > > > > > > > > > > > > > > > > > > > > The script contains only one command > > > > > > > > > > > > > > #!/bin/bash > > > > > > > START=$(date +%s) > > > > > > > # do something > > > > > > > # start your script work here > > > > > > > # your logic ends here > > > > > > > test='./ibis -d s0 -v -q "select > > avg(charge) where > > > > zdc>90"' > > > > > > > eval $test > > > > > > > END=$(date +%s) > > > > > > > DIFF=$(( $END - $START )) > > > > > > > echo "It took $DIFF seconds" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > FastBit-users mailing list > > > > > > > [email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>> > > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>>> > > > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>> > > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>>>> > > > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>> > > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>>> > > > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>> > > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > > <mailto:[email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>>>>>> > > > > > > > > > > > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
