I¹m curious to see roughly how much query time is spent reading selected
column data vs evaluating the condition.  Maybe try to select just a
single column with the same where clause.


On 10/21/15, 10:04 AM, "[email protected] on behalf of
Oleg Gawriloff" <[email protected] on behalf of
[email protected]> wrote:

>We are looking for some DB for our ITAS system (Internet Traffic Archive
>System). Requirements are pretty straightforward: all data are numeric,
>very large quantity of strings (8G per day), only thing we need is fast
>search on that by limited number of fields. So, after some research I
>found out that fastbit used in similar systems at ntop/solarwinds NTA
>projects and performed some tests to clarify whether is it good or not.
>Results seems very strange to me:
>
>We have our test data, 131M strings. All data are numeric. I converted
>them to csv from binary format used by our netflow-collector
>(flow-tools) and put them in fastbit using:
>
>ardea -d /var/tmp/backup/fastbit/tmp2.1 -m "DPKTS:unsigned
>int,DOCTETS:unsigned int,FIRST:unsigned int,LAST:unsigned
>int,SRCADDR:unsigned int,DSTADDR:unsigned int,NETADDR:unsigned
>int,SRCPORT:unsigned short,DSTPORT:unsigned short,PROTO:unsigned
>byte,NATPORT:unsigned short,DSTAS:unsigned short" -t one2.1.csv
>ardea read 131472840 rows from one3.1.csv
>
>ardea -- duration: 171.51 sec(CPU), 210.203 sec(elapsed)
>
>after that I performed simple search 10 times like that:
>
>time ibis -d tmp3.1 -q "SELECT
>FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOC
>TETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND
>SRCADDR=1481989497 or DSTADDR=1481989497" -output res.txt
>
>doQuaere -- "SELECT
>FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOC
>TETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND
>SRCADDR=1481989497 or DSTADDR=1481989497" produced a table with 6309 rows
>and 12 columns
>
>real    0m2.258s
>
>First time was a long one, because of index creation, but others show
>similar executions time (2.5sec).
>
>The problem that linear search by flow-tools on the same data shows that
>fastbit only 6 times faster.
>
>  time flow-cat ft_uncompressed3* | flow-nfilter -f nfilter.cfg -F
>F_TIME_IP |  flow-export -f 2
>-mDPKTS,DOCTETS,FIRST,LAST,SRCADDR,DSTADDR,NEXTHOP,SRCPORT,DSTPORT,PROT,SR
>C_AS,DST_AS > res.txt
>flow-export: Exported 6309 records
>
>real    0m12.083s
>
>
>Which is worse - when I drop file cache flow-tools search time does not
>change, but fastbit jumps to 9 sec from 2.
>
>albatros2 fastbit # echo 3 > /proc/sys/vm/drop_caches && time ibis -d
>tmp3.1 -q "SELECT 
>FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOC
>TETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND
>SRCADDR=1481989497 or DSTADDR=1481989497" -output res.txt
>
>doQuaere -- "SELECT
>FIRST,LAST,PROTO,SRCADDR,SRCPORT,DSTADDR,DSTPORT,NETADDR,NATPORT,DSTAS,DOC
>TETS,DPKTS FROM tmp3.1 WHERE FIRST>1441054677 AND LAST<1441065957 AND
>SRCADDR=1481989497 or DSTADDR=1481989497" produced a table with 6309 rows
>and 12 columns
>
>real    0m9.744s
>
>
>I thought in either case results will be much better. May be I miss smth?
>
>
>-- 
>Signed, Oleg Gawriloff.
>
>_______________________________________________
>FastBit-users mailing list
>[email protected]
>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to