Hi, Petr,

Thanks for the update.

Regarding the memory usage, I need to rewrite the aggregation 
algorithms so reduce the memory usage.  This approach is 
well-understood, the only problem is that I have not had the time to 
implement the changes.

Regarding the varying query answering time, the basic story here is 
that the first time FastBit needs to read the indexes and possibly 
data from files.  Since FastBit is holding on to as much data in 
memory as possible, the next time you ask a similar question, 
presumably much of the data needed to answer the query would be in 
memory already.

FastBit has some of the dumbest aggregation algorithms known to men, 
therefore, if your query involves large aggregations, it would be slow 
every time.

John



On 10/17/11 11:43 PM, Thorgrin wrote:
> Hi,
>
> I will just add another quick info to the topic.
>
> 1) The isfinite problem I reported occured on gcc version 4.5.1
> 20101208 [gcc-4_5-branch revision 167585] (SUSE Linux)
>
> 2) I Modified fileManager not to count mmaped memory to total cache
> size and it works (changed about 4 lines of code). Still, I don't
> understand how it was possible to query large amounts of data without
> this modification.
>
> 3) I noticed that when i run the same query second time, it is way
> faster than the previous run (often more than 100x). This applies
> mostly for queries that returns small amount or rows (works for result
> as big as 7000 row but not for 27000. Why is that so?
>
> Petr
>
> On 15 October 2011 10:37, Thorgrin <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Hi John,
>
>     On 15 October 2011 06:02, K. John Wu <[email protected]
>     <mailto:[email protected]>> wrote:
>
>         Hi, Petr,
>
>         Thanks for you interest in FastBit.
>
>         Regarding isfinite, can you tell me what OS and compiler you
>         are using?  The function isfinite should be defined in math.h
>         which is included in the header.
>
>     I use OpenSUSE linux  11.4 with gcc compiler (g++) in latest
>     version available in the distro (not sure which it is right now, I
>     can let you know on monday). The problem is that c++ math library
>     <cmath> is included, which includes C <math.h>, but has #undefine
>     isfinite macro inside. Then it defines its own isfinite macro
>     inside STD namespace. Thus the need for "using std::isfinite;".
>     I believe this header file is very similar to mine:
>     http://www.aoc.nrao.edu/php/tjuerges/ALMA/STL/html-3.4.6/cmath-source.html
>
>         FastBit attempts to hold to things that it has read into
>         memory already until the memory is needed for something else.
>           This may explain why it is holding on to so much memory.  If
>         the memory is needed for the next set of tasks, FastBit will
>         give up the oldest data.
>
>
>     That might explain my experience right enough. The problem is that
>     when i try a query that returns even more records, it uses up all
>     the memory in my machine, making it unusable.
>     That is when i set ibis::fileManager::adjustCacheSize(10GB - a
>     lot); I do this because I need to work with large files, bigger
>     that my RAM memory. This allows me to mmap those files into
>     process virtual memory, but unfortunately it also allows fastbit
>     to allocate too much memory for itself.
>
>     It might be worth a try to modify fileManager not to account for
>     mmaped memory in its cache size, at least on 64bit systems where
>     address space is really vast. The the cache size might be set to
>     some reasonable value. Or as I suggested earlier, use another
>     cache size variable for mmaped memory.
>
>         Please let us know if valgrind reports memory leak or other
>         issues.
>
>     Valgrind reported memory leaks in SVN version 425, but in 432 it
>     does not seem to be an issue anymore.
>
>         John
>
>     Petr
>
>
>
>         On 10/14/11 2:32 AM, Thorgrin wrote:
>
>             Dear John,
>
>             we are using fastbit to store netflows in ipfix format and
>             we need to
>             work with large amounts of data.
>
>             I'am using latest SVN version (432), though it needed some
>             tuning to
>             compile (putting "using std::isfinite;" in part.cpp and
>             mensa.cpp).
>
>             The workflow is as follows:
>             1) Open the partition using
>                      ibis:part *part= new
>             ibis::part("path/to/partition", NULL, true);
>             2) Create table from this partition
>                      ibis::table *table = ibis::table::create(*part);
>             3) Perfom a query
>                     table = table->select("col1, col2, ...", "col1 =
>             80 AND ...");
>             /* also delete original table */
>             4) Use cursor to print the result
>                      .....
>
>             The problem is with amount of allocated memory.
>             I had to raise the limit for memory allocation in
>             fileManager to allow
>             for mmapping of large files. But this also means that
>             fastbit is
>             allowed to allocate large amount of physical memory. I
>             believe that
>             there should be two limits, one for physical memory and
>             for mapping
>             files, because on 64bit system I just don't care that 6GB
>             file is
>             mmaped, but I certainly don't fastbit to claim so much
>             memory for itself.
>
>             Nonetheless, i don't understand why it takes so much
>             memory while
>             performing simple select. Here are some measurements form
>             valgrind:
>
>             Query 1:
>
>             Estimating between 26 and 26 records /* table->estimate() */
>             Created new table, MB in use: 0 /* this is after
>             table::create(*part) */
>             Table filtered, MB in use: 4527 /* this comes from
>             ibis::fileManager::bytesInUse(), after table->select() */
>
>             ==19057== HEAP SUMMARY:
>             ==19057==     in use at exit: 0 bytes in 0 blocks
>             ==19057==   total heap usage: 3,165 allocs, 3,165 frees,
>             63,483,310
>             bytes allocated
>
>             Query 2:
>             Estimating between 1832 and 1832 records
>             Created new table, MB in use: 0
>             Table filtered, MB in use: 4130
>
>             ==17557== HEAP SUMMARY:
>             ==17557==     in use at exit: 0 bytes in 0 blocks
>             ==17557==   total heap usage: 44,471 allocs, 44,471 frees,
>             614,111,365
>             bytes allocated
>
>             The data: for queries 1,2
>             ~/Documents/devel/data/fi2/000000000001/1> ls -lh
>             total 4.3G
>             -rw-r--r-- 1 velan users 611M Oct 13 16:48 e0id1
>             -rw-r--r-- 1 velan users 153M Oct 13 16:48 e0id11
>             -rw-r--r-- 1 velan users 332M Oct 13 17:53 e0id11.idx
>             -rw-r--r-- 1 velan users 306M Oct 13 16:48 e0id12
>             -rw-r--r-- 1 velan users 379M Oct 14 09:03 e0id12.idx
>             -rw-r--r-- 1 velan users 611M Oct 13 16:48 e0id152
>             -rw-r--r-- 1 velan users 611M Oct 13 16:48 e0id153
>             -rw-r--r-- 1 velan users 611M Oct 13 16:48 e0id2
>             -rw-r--r-- 1 velan users  77M Oct 13 16:48 e0id4
>             -rw-r--r-- 1 velan users  23M Oct 13 17:54 e0id4.idx
>             -rw-r--r-- 1 velan users  77M Oct 13 16:48 e0id5
>             -rw-r--r-- 1 velan users  77M Oct 13 16:48 e0id6
>             -rw-r--r-- 1 velan users 153M Oct 13 16:48 e0id7
>             -rw-r--r-- 1 velan users 306M Oct 13 16:48 e0id8
>             -rw-r--r-- 1 velan users  870 Oct 13 16:48 -part.txt
>
>             Query uses columns with indexes for filtering and those
>             indexes were
>             generated automatically
>             
> ------------------------------------------------------------------------------------------
>             Query 3:
>             Estimating between 3515 and 3515 records
>             Created new table, MB in use: 0
>             Table filtered, MB in use: 2715
>
>             ==17773== HEAP SUMMARY:
>             ==17773==     in use at exit: 0 bytes in 0 blocks
>             ==17773==   total heap usage: 82,753 allocs, 82,753 frees,
>             2,500,927,652 bytes allocated
>
>
>             Data for query 3
>             ~/Documents/devel/data/fi2/000000000002/1> ls -lh
>             total 2.8G
>             -rw-r--r-- 1 velan users 400M Oct 13 17:06 e0id1
>             -rw-r--r-- 1 velan users 100M Oct 13 17:06 e0id11
>             -rw-r--r-- 1 velan users 227M Oct 13 17:51 e0id11.idx
>             -rw-r--r-- 1 velan users 200M Oct 13 17:06 e0id12
>             -rw-r--r-- 1 velan users 251M Oct 14 09:02 e0id12.idx
>             -rw-r--r-- 1 velan users 400M Oct 13 17:06 e0id152
>             -rw-r--r-- 1 velan users 400M Oct 13 17:06 e0id153
>             -rw-r--r-- 1 velan users 400M Oct 13 17:06 e0id2
>             -rw-r--r-- 1 velan users  50M Oct 13 17:06 e0id4
>             -rw-r--r-- 1 velan users  15M Oct 13 17:51 e0id4.idx
>             -rw-r--r-- 1 velan users  50M Oct 13 17:06 e0id5
>             -rw-r--r-- 1 velan users  50M Oct 13 17:06 e0id6
>             -rw-r--r-- 1 velan users 100M Oct 13 17:06 e0id7
>             -rw-r--r-- 1 velan users 200M Oct 13 17:06 e0id8
>             -rw-r--r-- 1 velan users  870 Oct 13 17:06 -part.txt
>             Query uses columns with indexes for filtering and those
>             indexes were
>             generated automatically
>             
> ------------------------------------------------------------------------------------------
>
>             Query three returns 3515 records but allocates 2.5GB
>             memory. This seem
>             kinda too much when all data is process have 2.8GB altogether.
>
>             If I am doing something wrong here, could you please
>             advise me how to
>             improve my interaction with fastbit API?
>             Should you require more information, please ask, any
>             advice would be
>             most welcome.
>
>             Yours sincerely,
>
>             Petr Velan
>
>
>             _______________________________________________
>             FastBit-users mailing list
>             [email protected]
>             <mailto:[email protected]>
>             https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
>
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to