We are looking for some DB for our ITAS system (Internet Traffic Archive System). Requirements are pretty straightforward: all data are numeric, very large quantity of strings (8G per day), only thing we need is fast search on that by limited number of fields. So, after some research I found out that fastbit used in similar systems at ntop/solarwinds NTA projects and performed some tests to clarify whether is it good or not. Results seems very strange to me:
We have our test data, 131M strings. All data are numeric. I converted them to csv from binary format used by our netflow-collector (flow-tools) and put them in fastbit using: ardea -d /var/tmp/backup/fastbit/tmp2.1 -m "DPKTS:unsigned int,DOCTETS:unsigned int,FIRST:unsigned int,LAST:unsigned int,SRCADDR:unsigned int,DSTADDR:unsigned int,NETADDR:unsigned int,SRCPORT:unsigned short,DSTPORT:unsigned short,PROTO:unsigned byte,NATPORT:unsigned short,DSTAS:unsigned short" -t one2.1.csv ardea read 131472840 rows from one3.1.csv ardea -- duration: 171.51 sec(CPU), 210.203 sec(elapsed) after that I performed simple search 10 times like that: time ibis -d tmp3.1 -q "SELECT FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOCTETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND SRCADDR=1481989497 or DSTADDR=1481989497" -output res.txt doQuaere -- "SELECT FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOCTETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND SRCADDR=1481989497 or DSTADDR=1481989497" produced a table with 6309 rows and 12 columns real 0m2.258s First time was a long one, because of index creation, but others show similar executions time (2.5sec). The problem that linear search by flow-tools on the same data shows that fastbit only 6 times faster. time flow-cat ft_uncompressed3* | flow-nfilter -f nfilter.cfg -F F_TIME_IP | flow-export -f 2 -mDPKTS,DOCTETS,FIRST,LAST,SRCADDR,DSTADDR,NEXTHOP,SRCPORT,DSTPORT,PROT,SRC_AS,DST_AS > res.txt flow-export: Exported 6309 records real 0m12.083s Which is worse - when I drop file cache flow-tools search time does not change, but fastbit jumps to 9 sec from 2. albatros2 fastbit # echo 3 > /proc/sys/vm/drop_caches && time ibis -d tmp3.1 -q "SELECT FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOCTETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND SRCADDR=1481989497 or DSTADDR=1481989497" -output res.txt doQuaere -- "SELECT FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOCTETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND SRCADDR=1481989497 or DSTADDR=1481989497" produced a table with 6309 rows and 12 columns real 0m9.744s I thought in either case results will be much better. May be I miss smth? -- Signed, Oleg Gawriloff. _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
