Dear John,

we are using fastbit to store netflows in ipfix format and we need to work
with large amounts of data.

I'am using latest SVN version (432), though it needed some tuning to compile
(putting "using std::isfinite;" in part.cpp and mensa.cpp).

The workflow is as follows:
1) Open the partition using
        ibis:part *part= new ibis::part("path/to/partition", NULL, true);
2) Create table from this partition
        ibis::table *table = ibis::table::create(*part);
3) Perfom a query
       table = table->select("col1, col2, ...", "col1 = 80 AND ..."); /*
also delete original table */
4) Use cursor to print the result
        .....

The problem is with amount of allocated memory.
I had to raise the limit for memory allocation in fileManager to allow for
mmapping of large files. But this also means that fastbit is allowed to
allocate large amount of physical memory. I believe that there should be two
limits, one for physical memory and for mapping files, because on 64bit
system I just don't care that 6GB file is mmaped, but I certainly don't
fastbit to claim so much memory for itself.

Nonetheless, i don't understand why it takes so much memory while performing
simple select. Here are some measurements form valgrind:

Query 1:

Estimating between 26 and 26 records /* table->estimate() */
Created new table, MB in use: 0 /* this is after table::create(*part) */
Table filtered, MB in use: 4527 /* this comes from
ibis::fileManager::bytesInUse(), after table->select() */

==19057== HEAP SUMMARY:
==19057==     in use at exit: 0 bytes in 0 blocks
==19057==   total heap usage: 3,165 allocs, 3,165 frees, 63,483,310 bytes
allocated

Query 2:
Estimating between 1832 and 1832 records
Created new table, MB in use: 0
Table filtered, MB in use: 4130

==17557== HEAP SUMMARY:
==17557==     in use at exit: 0 bytes in 0 blocks
==17557==   total heap usage: 44,471 allocs, 44,471 frees, 614,111,365 bytes
allocated

The data: for queries 1,2
~/Documents/devel/data/fi2/000000000001/1> ls -lh
total 4.3G
-rw-r--r-- 1 velan users 611M Oct 13 16:48 e0id1
-rw-r--r-- 1 velan users 153M Oct 13 16:48 e0id11
-rw-r--r-- 1 velan users 332M Oct 13 17:53 e0id11.idx
-rw-r--r-- 1 velan users 306M Oct 13 16:48 e0id12
-rw-r--r-- 1 velan users 379M Oct 14 09:03 e0id12.idx
-rw-r--r-- 1 velan users 611M Oct 13 16:48 e0id152
-rw-r--r-- 1 velan users 611M Oct 13 16:48 e0id153
-rw-r--r-- 1 velan users 611M Oct 13 16:48 e0id2
-rw-r--r-- 1 velan users  77M Oct 13 16:48 e0id4
-rw-r--r-- 1 velan users  23M Oct 13 17:54 e0id4.idx
-rw-r--r-- 1 velan users  77M Oct 13 16:48 e0id5
-rw-r--r-- 1 velan users  77M Oct 13 16:48 e0id6
-rw-r--r-- 1 velan users 153M Oct 13 16:48 e0id7
-rw-r--r-- 1 velan users 306M Oct 13 16:48 e0id8
-rw-r--r-- 1 velan users  870 Oct 13 16:48 -part.txt

Query uses columns with indexes for filtering and those indexes were
generated automatically
------------------------------------------------------------------------------------------
Query 3:
Estimating between 3515 and 3515 records
Created new table, MB in use: 0
Table filtered, MB in use: 2715

==17773== HEAP SUMMARY:
==17773==     in use at exit: 0 bytes in 0 blocks
==17773==   total heap usage: 82,753 allocs, 82,753 frees, 2,500,927,652
bytes allocated


Data for query 3
~/Documents/devel/data/fi2/000000000002/1> ls -lh
total 2.8G
-rw-r--r-- 1 velan users 400M Oct 13 17:06 e0id1
-rw-r--r-- 1 velan users 100M Oct 13 17:06 e0id11
-rw-r--r-- 1 velan users 227M Oct 13 17:51 e0id11.idx
-rw-r--r-- 1 velan users 200M Oct 13 17:06 e0id12
-rw-r--r-- 1 velan users 251M Oct 14 09:02 e0id12.idx
-rw-r--r-- 1 velan users 400M Oct 13 17:06 e0id152
-rw-r--r-- 1 velan users 400M Oct 13 17:06 e0id153
-rw-r--r-- 1 velan users 400M Oct 13 17:06 e0id2
-rw-r--r-- 1 velan users  50M Oct 13 17:06 e0id4
-rw-r--r-- 1 velan users  15M Oct 13 17:51 e0id4.idx
-rw-r--r-- 1 velan users  50M Oct 13 17:06 e0id5
-rw-r--r-- 1 velan users  50M Oct 13 17:06 e0id6
-rw-r--r-- 1 velan users 100M Oct 13 17:06 e0id7
-rw-r--r-- 1 velan users 200M Oct 13 17:06 e0id8
-rw-r--r-- 1 velan users  870 Oct 13 17:06 -part.txt
Query uses columns with indexes for filtering and those indexes were
generated automatically
------------------------------------------------------------------------------------------

Query three returns 3515 records but allocates 2.5GB memory. This seem kinda
too much when all data is process have 2.8GB altogether.

If I am doing something wrong here, could you please advise me how to
improve my interaction with fastbit API?
Should you require more information, please ask, any advice would be most
welcome.

Yours sincerely,

Petr Velan
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to