Hi, Petr,

>From the stack traces, look like one thread is trying to free a data
partition object while another one is trying to reorder the rows of
presumably another data partition.  The first mutex lock is invoked
from the constructor of a storage object (ibis::fileManager::storage).
 This is invoked because the amount of data in memory (tracked by the
file manager) is close to the prescribed maximum (maxBytes).  The
second mutex lock is invoked from a function called
ibis::fileManager::removeCleaner (which is invoked by the destructor
of an ibis::part object).

Running out memory seems to be the fundamental problem here.
Presumably, you only need to do reordering once and your datasets are
quite large.  I would suggest that you use only a single thread to
reorder your data - this way all the memory will devoted to a single
reordering operation.

If you really do have a lot of memory (or each data partition is
relatively small) and want to do the reordering with multiple threads,
then delay the operation of freeing the ibis::part objects until you
are done with all reordering operations.  The cleaner objects from
each data partition will make sure each ibis::part object is taking
only a minimal amount of memory.

A note of warning, the current code only sort the numerical values,
any strings or blobs will be left untouched.  If your datasets have
strings or blobs, your datasets will not be coherent after calling the
function reorder!

John


On 8/29/12 4:57 AM, Petr Velan wrote:
> Hi John,
> 
> thank you for all the work that you put into the FastBit library, it
> allows us to achieve great results!
> 
> I've bumped into a little bug which might be very hard to reproduce or
> identify. I'm using two thread to reorder and index data that are
> already stored on disk. It was ok for a little while, but then it
> stuck in deadlock. Here are gdb traces from both threads,
> unfortunately without debugging symbols, so that the specific files
> and lines are unknown.
> 
> We are currently using the SVN version 532.
> 
> (gdb) bt
> #0  0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00007f898345e257 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007f898271e074 in ibis::fileManager::storage::storage(unsigned
> long) () from /usr/lib64/libfastbit.so.0
> #4  0x00007f898271eb16 in ibis::fileManager::storage::enlarge(unsigned
> long) () from /usr/lib64/libfastbit.so.0
> #5  0x00007f898272214f in ibis::fileManager::roFile::doRead(char
> const*) () from /usr/lib64/libfastbit.so.0
> #6  0x00007f8982723b4b in ibis::fileManager::getFile(char const*,
> ibis::fileManager::storage**, ibis::fileManager::ACCESS_PREFERENCE) ()
> from /usr/lib64/libfastbit.so.0
> #7  0x00007f898273406a in int ibis::fileManager::getFile<unsigned
> short>(char const*, ibis::array_t<unsigned short>&,
> ibis::fileManager::ACCESS_PREFERENCE) () from
> /usr/lib64/libfastbit.so.0
> #8  0x00007f8981f9f4a5 in ibis::column::actualMinMax(char const*,
> ibis::bitvector const&, double&, double&) const () from
> /usr/lib64/libfastbit.so.0
> #9  0x00007f8981fa3546 in ibis::column::computeMinMax() () from
> /usr/lib64/libfastbit.so.0
> #10 0x00007f89827beae6 in
> ibis::part::gatherSortKeys(ibis::array_t<char const*>&) () from
> /usr/lib64/libfastbit.so.0
> #11 0x00007f89827bfc56 in ibis::part::reorder() () from
> /usr/lib64/libfastbit.so.0
> #12 0x00007f8982c7e2af in reorder_index(void*) () from
> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
> #13 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
> #14 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
> #15 0x0000000000000000 in ?? ()
> (gdb)
> 
> 
> (gdb) bt
> #0  0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00007f898345e257 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007f898175a6aa in
> ibis::util::mutexLock::mutexLock(pthread_mutex_t*, char const*) ()
> from /usr/lib64/libfastbit.so.0
> #4  0x00007f89827177d4 in
> ibis::fileManager::removeCleaner(ibis::fileManager::cleaner const*) ()
> from /usr/lib64/libfastbit.so.0
> #5  0x00007f8981735952 in ibis::part::~part() () from 
> /usr/lib64/libfastbit.so.0
> #6  0x00007f8981735c29 in ibis::part::~part() () from 
> /usr/lib64/libfastbit.so.0
> #7  0x00007f8982c7e2cd in reorder_index(void*) () from
> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
> #8  0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
> #9  0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
> #10 0x0000000000000000 in ?? ()
> (gdb)
> 
> 
> Do you have any idea what might be going on?
> 
> With regards,
> Petr Velan
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to