Hi, Petr, Thanks for clarifying the use case. Looks like you can not wait for everything to be done before releasing the ibis::part objects. Regarding the memory usage, FastBit does lazy deletions - as long as no one needs new memory, the existing content read from files will be kept in memory. The default maximum memory to be used is a half of the physical memory - which explains what you've observed. Once reaching that limit, ibis::fileManager::unload will be called to remove the content of files that are no in active use. In your case, it sounds like there will be a lot of old files to be removed from memory.
Since there is no clear indication which thread is holding on to the mutex lock, we might need to create a multithreaded data generator that can mimic your data ingestion process. If you have simple one that I can borrow, I would greatly appreciate it. Most likely, another copy of ibis::fileManager::getFile is holding on to the ibis::fileManager::mutex. However, logically, that is not possible because that thread can only be waiting on a conditional variable in which case it should have yield the mutex lock already. Anyway, something gnarly is going on here.. John On 8/29/12 10:54 PM, Petr Velan wrote: > Hi John, > > I still do not understand why there is a deadlock, or why is the > access to different partitions managed by same mutex lock. > > Our use case is this: > We have a process that collects data from network and stores them to > fastbit partitions. Each partition contains 5 minutes of data, > approximately 300-400MB. After 5 minutes expire, new thread is > launched that creates ibis::part, runs reorder, deletes the part, > creates ibis::table which is used to create indexes and then deletes > the table. After that the thread ends. > > Since there is data from multiple sources, there are multiple threads > that store the data and reorder/index it. > > What is bothering me are two things: > The deadlock, since the mutex should only synchronize, I wonder who > really holds the lock when both threads are waiting for it. > Second is that the memory used by the process constantly grows. After > the parts and tables are deleted, I would expect the memory to be > released as well, since for next 5 minutes, it will not be needed. > Unfortunately, FastBit does not free the memory until it reaches 50% > of total memory, which in our case is 6GB. That is kind of > unfortunate, since what it should really need is about 1GB of memory > for reorder in the worst case and then the memory should be free to > use by other processes. Is there any way to achieve this? The memory > is consumed even without the reordering, only when building indexes. > > Thank you for the warning about strings, we plan to use them in > future, so we will have to without the reorder in that case. > > Petr > > On 29 August 2012 18:28, K. John Wu <[email protected]> wrote: >> Hi, Petr, >> >> From the stack traces, look like one thread is trying to free a data >> partition object while another one is trying to reorder the rows of >> presumably another data partition. The first mutex lock is invoked >> from the constructor of a storage object (ibis::fileManager::storage). >> This is invoked because the amount of data in memory (tracked by the >> file manager) is close to the prescribed maximum (maxBytes). The >> second mutex lock is invoked from a function called >> ibis::fileManager::removeCleaner (which is invoked by the destructor >> of an ibis::part object). >> >> Running out memory seems to be the fundamental problem here. >> Presumably, you only need to do reordering once and your datasets are >> quite large. I would suggest that you use only a single thread to >> reorder your data - this way all the memory will devoted to a single >> reordering operation. >> >> If you really do have a lot of memory (or each data partition is >> relatively small) and want to do the reordering with multiple threads, >> then delay the operation of freeing the ibis::part objects until you >> are done with all reordering operations. The cleaner objects from >> each data partition will make sure each ibis::part object is taking >> only a minimal amount of memory. >> >> A note of warning, the current code only sort the numerical values, >> any strings or blobs will be left untouched. If your datasets have >> strings or blobs, your datasets will not be coherent after calling the >> function reorder! >> >> John >> >> >> On 8/29/12 4:57 AM, Petr Velan wrote: >>> Hi John, >>> >>> thank you for all the work that you put into the FastBit library, it >>> allows us to achieve great results! >>> >>> I've bumped into a little bug which might be very hard to reproduce or >>> identify. I'm using two thread to reorder and index data that are >>> already stored on disk. It was ok for a little while, but then it >>> stuck in deadlock. Here are gdb traces from both threads, >>> unfortunately without debugging symbols, so that the specific files >>> and lines are unknown. >>> >>> We are currently using the SVN version 532. >>> >>> (gdb) bt >>> #0 0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0 >>> #1 0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0 >>> #2 0x00007f898345e257 in pthread_mutex_lock () from /lib64/libpthread.so.0 >>> #3 0x00007f898271e074 in ibis::fileManager::storage::storage(unsigned >>> long) () from /usr/lib64/libfastbit.so.0 >>> #4 0x00007f898271eb16 in ibis::fileManager::storage::enlarge(unsigned >>> long) () from /usr/lib64/libfastbit.so.0 >>> #5 0x00007f898272214f in ibis::fileManager::roFile::doRead(char >>> const*) () from /usr/lib64/libfastbit.so.0 >>> #6 0x00007f8982723b4b in ibis::fileManager::getFile(char const*, >>> ibis::fileManager::storage**, ibis::fileManager::ACCESS_PREFERENCE) () >>> from /usr/lib64/libfastbit.so.0 >>> #7 0x00007f898273406a in int ibis::fileManager::getFile<unsigned >>> short>(char const*, ibis::array_t<unsigned short>&, >>> ibis::fileManager::ACCESS_PREFERENCE) () from >>> /usr/lib64/libfastbit.so.0 >>> #8 0x00007f8981f9f4a5 in ibis::column::actualMinMax(char const*, >>> ibis::bitvector const&, double&, double&) const () from >>> /usr/lib64/libfastbit.so.0 >>> #9 0x00007f8981fa3546 in ibis::column::computeMinMax() () from >>> /usr/lib64/libfastbit.so.0 >>> #10 0x00007f89827beae6 in >>> ibis::part::gatherSortKeys(ibis::array_t<char const*>&) () from >>> /usr/lib64/libfastbit.so.0 >>> #11 0x00007f89827bfc56 in ibis::part::reorder() () from >>> /usr/lib64/libfastbit.so.0 >>> #12 0x00007f8982c7e2af in reorder_index(void*) () from >>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so >>> #13 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0 >>> #14 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6 >>> #15 0x0000000000000000 in ?? () >>> (gdb) >>> >>> >>> (gdb) bt >>> #0 0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0 >>> #1 0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0 >>> #2 0x00007f898345e257 in pthread_mutex_lock () from /lib64/libpthread.so.0 >>> #3 0x00007f898175a6aa in >>> ibis::util::mutexLock::mutexLock(pthread_mutex_t*, char const*) () >>> from /usr/lib64/libfastbit.so.0 >>> #4 0x00007f89827177d4 in >>> ibis::fileManager::removeCleaner(ibis::fileManager::cleaner const*) () >>> from /usr/lib64/libfastbit.so.0 >>> #5 0x00007f8981735952 in ibis::part::~part() () from >>> /usr/lib64/libfastbit.so.0 >>> #6 0x00007f8981735c29 in ibis::part::~part() () from >>> /usr/lib64/libfastbit.so.0 >>> #7 0x00007f8982c7e2cd in reorder_index(void*) () from >>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so >>> #8 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0 >>> #9 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6 >>> #10 0x0000000000000000 in ?? () >>> (gdb) >>> >>> >>> Do you have any idea what might be going on? >>> >>> With regards, >>> Petr Velan >>> _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
