I¹m curious to see roughly how much query time is spent reading selected column data vs evaluating the condition. Maybe try to select just a single column with the same where clause.
On 10/21/15, 10:04 AM, "[email protected] on behalf of Oleg Gawriloff" <[email protected] on behalf of [email protected]> wrote: >We are looking for some DB for our ITAS system (Internet Traffic Archive >System). Requirements are pretty straightforward: all data are numeric, >very large quantity of strings (8G per day), only thing we need is fast >search on that by limited number of fields. So, after some research I >found out that fastbit used in similar systems at ntop/solarwinds NTA >projects and performed some tests to clarify whether is it good or not. >Results seems very strange to me: > >We have our test data, 131M strings. All data are numeric. I converted >them to csv from binary format used by our netflow-collector >(flow-tools) and put them in fastbit using: > >ardea -d /var/tmp/backup/fastbit/tmp2.1 -m "DPKTS:unsigned >int,DOCTETS:unsigned int,FIRST:unsigned int,LAST:unsigned >int,SRCADDR:unsigned int,DSTADDR:unsigned int,NETADDR:unsigned >int,SRCPORT:unsigned short,DSTPORT:unsigned short,PROTO:unsigned >byte,NATPORT:unsigned short,DSTAS:unsigned short" -t one2.1.csv >ardea read 131472840 rows from one3.1.csv > >ardea -- duration: 171.51 sec(CPU), 210.203 sec(elapsed) > >after that I performed simple search 10 times like that: > >time ibis -d tmp3.1 -q "SELECT >FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOC >TETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND >SRCADDR=1481989497 or DSTADDR=1481989497" -output res.txt > >doQuaere -- "SELECT >FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOC >TETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND >SRCADDR=1481989497 or DSTADDR=1481989497" produced a table with 6309 rows >and 12 columns > >real 0m2.258s > >First time was a long one, because of index creation, but others show >similar executions time (2.5sec). > >The problem that linear search by flow-tools on the same data shows that >fastbit only 6 times faster. > > time flow-cat ft_uncompressed3* | flow-nfilter -f nfilter.cfg -F >F_TIME_IP | flow-export -f 2 >-mDPKTS,DOCTETS,FIRST,LAST,SRCADDR,DSTADDR,NEXTHOP,SRCPORT,DSTPORT,PROT,SR >C_AS,DST_AS > res.txt >flow-export: Exported 6309 records > >real 0m12.083s > > >Which is worse - when I drop file cache flow-tools search time does not >change, but fastbit jumps to 9 sec from 2. > >albatros2 fastbit # echo 3 > /proc/sys/vm/drop_caches && time ibis -d >tmp3.1 -q "SELECT >FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOC >TETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND >SRCADDR=1481989497 or DSTADDR=1481989497" -output res.txt > >doQuaere -- "SELECT >FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOC >TETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND >SRCADDR=1481989497 or DSTADDR=1481989497" produced a table with 6309 rows >and 12 columns > >real 0m9.744s > > >I thought in either case results will be much better. May be I miss smth? > > >-- >Signed, Oleg Gawriloff. > >_______________________________________________ >FastBit-users mailing list >[email protected] >https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
