Hi John, I still do not understand why there is a deadlock, or why is the access to different partitions managed by same mutex lock.
Our use case is this: We have a process that collects data from network and stores them to fastbit partitions. Each partition contains 5 minutes of data, approximately 300-400MB. After 5 minutes expire, new thread is launched that creates ibis::part, runs reorder, deletes the part, creates ibis::table which is used to create indexes and then deletes the table. After that the thread ends. Since there is data from multiple sources, there are multiple threads that store the data and reorder/index it. What is bothering me are two things: The deadlock, since the mutex should only synchronize, I wonder who really holds the lock when both threads are waiting for it. Second is that the memory used by the process constantly grows. After the parts and tables are deleted, I would expect the memory to be released as well, since for next 5 minutes, it will not be needed. Unfortunately, FastBit does not free the memory until it reaches 50% of total memory, which in our case is 6GB. That is kind of unfortunate, since what it should really need is about 1GB of memory for reorder in the worst case and then the memory should be free to use by other processes. Is there any way to achieve this? The memory is consumed even without the reordering, only when building indexes. Thank you for the warning about strings, we plan to use them in future, so we will have to without the reorder in that case. Petr On 29 August 2012 18:28, K. John Wu <[email protected]> wrote: > Hi, Petr, > > From the stack traces, look like one thread is trying to free a data > partition object while another one is trying to reorder the rows of > presumably another data partition. The first mutex lock is invoked > from the constructor of a storage object (ibis::fileManager::storage). > This is invoked because the amount of data in memory (tracked by the > file manager) is close to the prescribed maximum (maxBytes). The > second mutex lock is invoked from a function called > ibis::fileManager::removeCleaner (which is invoked by the destructor > of an ibis::part object). > > Running out memory seems to be the fundamental problem here. > Presumably, you only need to do reordering once and your datasets are > quite large. I would suggest that you use only a single thread to > reorder your data - this way all the memory will devoted to a single > reordering operation. > > If you really do have a lot of memory (or each data partition is > relatively small) and want to do the reordering with multiple threads, > then delay the operation of freeing the ibis::part objects until you > are done with all reordering operations. The cleaner objects from > each data partition will make sure each ibis::part object is taking > only a minimal amount of memory. > > A note of warning, the current code only sort the numerical values, > any strings or blobs will be left untouched. If your datasets have > strings or blobs, your datasets will not be coherent after calling the > function reorder! > > John > > > On 8/29/12 4:57 AM, Petr Velan wrote: >> Hi John, >> >> thank you for all the work that you put into the FastBit library, it >> allows us to achieve great results! >> >> I've bumped into a little bug which might be very hard to reproduce or >> identify. I'm using two thread to reorder and index data that are >> already stored on disk. It was ok for a little while, but then it >> stuck in deadlock. Here are gdb traces from both threads, >> unfortunately without debugging symbols, so that the specific files >> and lines are unknown. >> >> We are currently using the SVN version 532. >> >> (gdb) bt >> #0 0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0 >> #1 0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0 >> #2 0x00007f898345e257 in pthread_mutex_lock () from /lib64/libpthread.so.0 >> #3 0x00007f898271e074 in ibis::fileManager::storage::storage(unsigned >> long) () from /usr/lib64/libfastbit.so.0 >> #4 0x00007f898271eb16 in ibis::fileManager::storage::enlarge(unsigned >> long) () from /usr/lib64/libfastbit.so.0 >> #5 0x00007f898272214f in ibis::fileManager::roFile::doRead(char >> const*) () from /usr/lib64/libfastbit.so.0 >> #6 0x00007f8982723b4b in ibis::fileManager::getFile(char const*, >> ibis::fileManager::storage**, ibis::fileManager::ACCESS_PREFERENCE) () >> from /usr/lib64/libfastbit.so.0 >> #7 0x00007f898273406a in int ibis::fileManager::getFile<unsigned >> short>(char const*, ibis::array_t<unsigned short>&, >> ibis::fileManager::ACCESS_PREFERENCE) () from >> /usr/lib64/libfastbit.so.0 >> #8 0x00007f8981f9f4a5 in ibis::column::actualMinMax(char const*, >> ibis::bitvector const&, double&, double&) const () from >> /usr/lib64/libfastbit.so.0 >> #9 0x00007f8981fa3546 in ibis::column::computeMinMax() () from >> /usr/lib64/libfastbit.so.0 >> #10 0x00007f89827beae6 in >> ibis::part::gatherSortKeys(ibis::array_t<char const*>&) () from >> /usr/lib64/libfastbit.so.0 >> #11 0x00007f89827bfc56 in ibis::part::reorder() () from >> /usr/lib64/libfastbit.so.0 >> #12 0x00007f8982c7e2af in reorder_index(void*) () from >> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so >> #13 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0 >> #14 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6 >> #15 0x0000000000000000 in ?? () >> (gdb) >> >> >> (gdb) bt >> #0 0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0 >> #1 0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0 >> #2 0x00007f898345e257 in pthread_mutex_lock () from /lib64/libpthread.so.0 >> #3 0x00007f898175a6aa in >> ibis::util::mutexLock::mutexLock(pthread_mutex_t*, char const*) () >> from /usr/lib64/libfastbit.so.0 >> #4 0x00007f89827177d4 in >> ibis::fileManager::removeCleaner(ibis::fileManager::cleaner const*) () >> from /usr/lib64/libfastbit.so.0 >> #5 0x00007f8981735952 in ibis::part::~part() () from >> /usr/lib64/libfastbit.so.0 >> #6 0x00007f8981735c29 in ibis::part::~part() () from >> /usr/lib64/libfastbit.so.0 >> #7 0x00007f8982c7e2cd in reorder_index(void*) () from >> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so >> #8 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0 >> #9 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6 >> #10 0x0000000000000000 in ?? () >> (gdb) >> >> >> Do you have any idea what might be going on? >> >> With regards, >> Petr Velan >> _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
