We are looking for some DB for our ITAS system (Internet Traffic Archive 
System). Requirements are pretty straightforward: all data are numeric, 
very large quantity of strings (8G per day), only thing we need is fast 
search on that by limited number of fields. So, after some research I 
found out that fastbit used in similar systems at ntop/solarwinds NTA 
projects and performed some tests to clarify whether is it good or not. 
Results seems very strange to me:

We have our test data, 131M strings. All data are numeric. I converted 
them to csv from binary format used by our netflow-collector 
(flow-tools) and put them in fastbit using:

ardea -d /var/tmp/backup/fastbit/tmp2.1 -m "DPKTS:unsigned int,DOCTETS:unsigned 
int,FIRST:unsigned int,LAST:unsigned int,SRCADDR:unsigned int,DSTADDR:unsigned 
int,NETADDR:unsigned int,SRCPORT:unsigned short,DSTPORT:unsigned 
short,PROTO:unsigned byte,NATPORT:unsigned short,DSTAS:unsigned short" -t 
one2.1.csv
ardea read 131472840 rows from one3.1.csv

ardea -- duration: 171.51 sec(CPU), 210.203 sec(elapsed)

after that I performed simple search 10 times like that:

time ibis -d tmp3.1 -q "SELECT 
FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOCTETS,DPKTS
 FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND SRCADDR=1481989497 
or DSTADDR=1481989497" -output res.txt

doQuaere -- "SELECT 
FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOCTETS,DPKTS
 FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND SRCADDR=1481989497 
or DSTADDR=1481989497" produced a table with 6309 rows and 12 columns

real    0m2.258s

First time was a long one, because of index creation, but others show similar 
executions time (2.5sec).

The problem that linear search by flow-tools on the same data shows that 
fastbit only 6 times faster.

  time flow-cat ft_uncompressed3* | flow-nfilter -f nfilter.cfg -F F_TIME_IP |  
flow-export -f 2 
-mDPKTS,DOCTETS,FIRST,LAST,SRCADDR,DSTADDR,NEXTHOP,SRCPORT,DSTPORT,PROT,SRC_AS,DST_AS
 > res.txt
flow-export: Exported 6309 records

real    0m12.083s


Which is worse - when I drop file cache flow-tools search time does not change, 
but fastbit jumps to 9 sec from 2.

albatros2 fastbit # echo 3 > /proc/sys/vm/drop_caches && time ibis -d tmp3.1 -q 
"SELECT 
FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOCTETS,DPKTS
 FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND SRCADDR=1481989497 
or DSTADDR=1481989497" -output res.txt

doQuaere -- "SELECT 
FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOCTETS,DPKTS
 FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND SRCADDR=1481989497 
or DSTADDR=1481989497" produced a table with 6309 rows and 12 columns

real    0m9.744s


I thought in either case results will be much better. May be I miss smth?


-- 
Signed, Oleg Gawriloff.

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to