Hi, Petr,

Thanks for clarifying the use case.  Looks like you can not wait for
everything to be done before releasing the ibis::part objects.
Regarding the memory usage, FastBit does lazy deletions - as long as
no one needs new memory, the existing content read from files will be
kept in memory.  The default maximum memory to be used is a half of
the physical memory - which explains what you've observed.  Once
reaching that limit, ibis::fileManager::unload will be called to
remove the content of files that are no in active use.  In your case,
it sounds like there will be a lot of old files to be removed from memory.

Since there is no clear indication which thread is holding on to the
mutex lock, we might need to create a multithreaded data generator
that can mimic your data ingestion process.  If you have simple one
that I can borrow, I would greatly appreciate it.

Most likely, another copy of ibis::fileManager::getFile is holding on
to the ibis::fileManager::mutex.  However, logically, that is not
possible because that thread can only be waiting on a conditional
variable in which case it should have yield the mutex lock already.
Anyway, something gnarly is going on here..

John


On 8/29/12 10:54 PM, Petr Velan wrote:
> Hi John,
> 
> I still do not understand why there is a deadlock, or why is the
> access to different partitions managed by same mutex lock.
> 
> Our use case is this:
> We have a process that collects data from network and stores them to
> fastbit partitions. Each partition contains 5 minutes of data,
> approximately 300-400MB. After 5 minutes expire, new thread is
> launched that creates ibis::part, runs reorder, deletes the part,
> creates ibis::table which is used to create indexes and then deletes
> the table. After that the thread ends.
> 
> Since there is data from multiple sources, there are multiple threads
> that store the data and reorder/index it.
> 
> What is bothering me are two things:
> The deadlock, since the mutex should only synchronize, I wonder who
> really holds the lock when both threads are waiting for it.
> Second is that the memory used by the process constantly grows. After
> the parts and tables are deleted, I would expect the memory to be
> released as well, since for next 5 minutes, it will not be needed.
> Unfortunately, FastBit does not free the memory until it reaches 50%
> of total memory, which in our case is 6GB. That is kind of
> unfortunate, since what it should really need is about 1GB of memory
> for reorder in the worst case and then the memory should be free to
> use by other processes. Is there any way to achieve this? The memory
> is consumed even without the reordering, only when building indexes.
> 
> Thank you for the warning about strings, we plan to use them in
> future, so we will have to without the reorder in that case.
> 
> Petr
> 
> On 29 August 2012 18:28, K. John Wu <[email protected]> wrote:
>> Hi, Petr,
>>
>> From the stack traces, look like one thread is trying to free a data
>> partition object while another one is trying to reorder the rows of
>> presumably another data partition.  The first mutex lock is invoked
>> from the constructor of a storage object (ibis::fileManager::storage).
>>  This is invoked because the amount of data in memory (tracked by the
>> file manager) is close to the prescribed maximum (maxBytes).  The
>> second mutex lock is invoked from a function called
>> ibis::fileManager::removeCleaner (which is invoked by the destructor
>> of an ibis::part object).
>>
>> Running out memory seems to be the fundamental problem here.
>> Presumably, you only need to do reordering once and your datasets are
>> quite large.  I would suggest that you use only a single thread to
>> reorder your data - this way all the memory will devoted to a single
>> reordering operation.
>>
>> If you really do have a lot of memory (or each data partition is
>> relatively small) and want to do the reordering with multiple threads,
>> then delay the operation of freeing the ibis::part objects until you
>> are done with all reordering operations.  The cleaner objects from
>> each data partition will make sure each ibis::part object is taking
>> only a minimal amount of memory.
>>
>> A note of warning, the current code only sort the numerical values,
>> any strings or blobs will be left untouched.  If your datasets have
>> strings or blobs, your datasets will not be coherent after calling the
>> function reorder!
>>
>> John
>>
>>
>> On 8/29/12 4:57 AM, Petr Velan wrote:
>>> Hi John,
>>>
>>> thank you for all the work that you put into the FastBit library, it
>>> allows us to achieve great results!
>>>
>>> I've bumped into a little bug which might be very hard to reproduce or
>>> identify. I'm using two thread to reorder and index data that are
>>> already stored on disk. It was ok for a little while, but then it
>>> stuck in deadlock. Here are gdb traces from both threads,
>>> unfortunately without debugging symbols, so that the specific files
>>> and lines are unknown.
>>>
>>> We are currently using the SVN version 532.
>>>
>>> (gdb) bt
>>> #0  0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0
>>> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
>>> #2  0x00007f898345e257 in pthread_mutex_lock () from /lib64/libpthread.so.0
>>> #3  0x00007f898271e074 in ibis::fileManager::storage::storage(unsigned
>>> long) () from /usr/lib64/libfastbit.so.0
>>> #4  0x00007f898271eb16 in ibis::fileManager::storage::enlarge(unsigned
>>> long) () from /usr/lib64/libfastbit.so.0
>>> #5  0x00007f898272214f in ibis::fileManager::roFile::doRead(char
>>> const*) () from /usr/lib64/libfastbit.so.0
>>> #6  0x00007f8982723b4b in ibis::fileManager::getFile(char const*,
>>> ibis::fileManager::storage**, ibis::fileManager::ACCESS_PREFERENCE) ()
>>> from /usr/lib64/libfastbit.so.0
>>> #7  0x00007f898273406a in int ibis::fileManager::getFile<unsigned
>>> short>(char const*, ibis::array_t<unsigned short>&,
>>> ibis::fileManager::ACCESS_PREFERENCE) () from
>>> /usr/lib64/libfastbit.so.0
>>> #8  0x00007f8981f9f4a5 in ibis::column::actualMinMax(char const*,
>>> ibis::bitvector const&, double&, double&) const () from
>>> /usr/lib64/libfastbit.so.0
>>> #9  0x00007f8981fa3546 in ibis::column::computeMinMax() () from
>>> /usr/lib64/libfastbit.so.0
>>> #10 0x00007f89827beae6 in
>>> ibis::part::gatherSortKeys(ibis::array_t<char const*>&) () from
>>> /usr/lib64/libfastbit.so.0
>>> #11 0x00007f89827bfc56 in ibis::part::reorder() () from
>>> /usr/lib64/libfastbit.so.0
>>> #12 0x00007f8982c7e2af in reorder_index(void*) () from
>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
>>> #13 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
>>> #14 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
>>> #15 0x0000000000000000 in ?? ()
>>> (gdb)
>>>
>>>
>>> (gdb) bt
>>> #0  0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0
>>> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
>>> #2  0x00007f898345e257 in pthread_mutex_lock () from /lib64/libpthread.so.0
>>> #3  0x00007f898175a6aa in
>>> ibis::util::mutexLock::mutexLock(pthread_mutex_t*, char const*) ()
>>> from /usr/lib64/libfastbit.so.0
>>> #4  0x00007f89827177d4 in
>>> ibis::fileManager::removeCleaner(ibis::fileManager::cleaner const*) ()
>>> from /usr/lib64/libfastbit.so.0
>>> #5  0x00007f8981735952 in ibis::part::~part() () from 
>>> /usr/lib64/libfastbit.so.0
>>> #6  0x00007f8981735c29 in ibis::part::~part() () from 
>>> /usr/lib64/libfastbit.so.0
>>> #7  0x00007f8982c7e2cd in reorder_index(void*) () from
>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
>>> #8  0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
>>> #9  0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
>>> #10 0x0000000000000000 in ?? ()
>>> (gdb)
>>>
>>>
>>> Do you have any idea what might be going on?
>>>
>>> With regards,
>>> Petr Velan
>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to