Hi John,

I managed to get the test program running, but I cannot seem to be
able to reproduce the deadlock. I'll look further into it when I have
some spare time.

One thing I noticed using the test program, the reorder does not work
on read-only partition, but the documentation says that it should,
since it does not "change" the data. I'm not running the latest SVN
version, so maybe it is somehow resolved and consistent now. I just
wanted to let you know.

Thanks,
Petr

On 4 September 2012 07:30, Kesheng Wu <[email protected]> wrote:
> Hi, Petr,
>
> Attached is a modification of the file tests/setqgen.cpp to mimic your
> use case.  So far, it seems to produce exactly the same output set (as
> a whole, not in the individual partitions) as produced by
> tests/setqgen.cpp.  It works OK on my laptop.  Would you mind take a
> look and see if you can get it to behave more like what your program
> does?
>
> Thanks.
>
> John
>
>
> On Thu, Aug 30, 2012 at 11:45 PM, Petr Velan <[email protected]> wrote:
>> Hi John,
>>
>> The memory management in FastBit is good for batch mode of operation,
>> however I need to reduce the memory footprint, since there might be
>> some other operations that might require large amount of memory for
>> the short time, so that the FastBit cannot have it allocated all the
>> time.
>>
>> I know that there is a limit that allows to use maximum of half of
>> available memory and that it can be changed. I think it would be a
>> good thing to have two limits. One to set maximum that can be used and
>> other that would trigger the unload function. The result would be that
>> I would set FastBit to have 0.5GB at ready and allow it to expand to
>> 10GB. So when needed, the fastbit would use up to 10GB of memory, but
>> after that it would would free it and keep only 0.5GB for further use.
>> The default could still be to have both limits at half of available
>> memory. What do you think?
>>
>> We are currently trying to manually call
>> ibis::fileManager::instance().flushDir() to see if it helps to keep
>> the memory down, but I believe that the solution I described earlier
>> is much more generic.
>>
>> Unfortunately, I do not have any simple code to reproduce the
>> deadlock. I'll try to look into it, maybe compile FastBit with
>> debugging symbols to help us better understand what is  going on.
>>
>> Petr
>>
>> On 30 August 2012 20:29, K. John Wu <[email protected]> wrote:
>>> Hi, Petr,
>>>
>>> Thanks for clarifying the use case.  Looks like you can not wait for
>>> everything to be done before releasing the ibis::part objects.
>>> Regarding the memory usage, FastBit does lazy deletions - as long as
>>> no one needs new memory, the existing content read from files will be
>>> kept in memory.  The default maximum memory to be used is a half of
>>> the physical memory - which explains what you've observed.  Once
>>> reaching that limit, ibis::fileManager::unload will be called to
>>> remove the content of files that are no in active use.  In your case,
>>> it sounds like there will be a lot of old files to be removed from memory.
>>>
>>> Since there is no clear indication which thread is holding on to the
>>> mutex lock, we might need to create a multithreaded data generator
>>> that can mimic your data ingestion process.  If you have simple one
>>> that I can borrow, I would greatly appreciate it.
>>>
>>> Most likely, another copy of ibis::fileManager::getFile is holding on
>>> to the ibis::fileManager::mutex.  However, logically, that is not
>>> possible because that thread can only be waiting on a conditional
>>> variable in which case it should have yield the mutex lock already.
>>> Anyway, something gnarly is going on here..
>>>
>>> John
>>>
>>>
>>> On 8/29/12 10:54 PM, Petr Velan wrote:
>>>> Hi John,
>>>>
>>>> I still do not understand why there is a deadlock, or why is the
>>>> access to different partitions managed by same mutex lock.
>>>>
>>>> Our use case is this:
>>>> We have a process that collects data from network and stores them to
>>>> fastbit partitions. Each partition contains 5 minutes of data,
>>>> approximately 300-400MB. After 5 minutes expire, new thread is
>>>> launched that creates ibis::part, runs reorder, deletes the part,
>>>> creates ibis::table which is used to create indexes and then deletes
>>>> the table. After that the thread ends.
>>>>
>>>> Since there is data from multiple sources, there are multiple threads
>>>> that store the data and reorder/index it.
>>>>
>>>> What is bothering me are two things:
>>>> The deadlock, since the mutex should only synchronize, I wonder who
>>>> really holds the lock when both threads are waiting for it.
>>>> Second is that the memory used by the process constantly grows. After
>>>> the parts and tables are deleted, I would expect the memory to be
>>>> released as well, since for next 5 minutes, it will not be needed.
>>>> Unfortunately, FastBit does not free the memory until it reaches 50%
>>>> of total memory, which in our case is 6GB. That is kind of
>>>> unfortunate, since what it should really need is about 1GB of memory
>>>> for reorder in the worst case and then the memory should be free to
>>>> use by other processes. Is there any way to achieve this? The memory
>>>> is consumed even without the reordering, only when building indexes.
>>>>
>>>> Thank you for the warning about strings, we plan to use them in
>>>> future, so we will have to without the reorder in that case.
>>>>
>>>> Petr
>>>>
>>>> On 29 August 2012 18:28, K. John Wu <[email protected]> wrote:
>>>>> Hi, Petr,
>>>>>
>>>>> From the stack traces, look like one thread is trying to free a data
>>>>> partition object while another one is trying to reorder the rows of
>>>>> presumably another data partition.  The first mutex lock is invoked
>>>>> from the constructor of a storage object (ibis::fileManager::storage).
>>>>>  This is invoked because the amount of data in memory (tracked by the
>>>>> file manager) is close to the prescribed maximum (maxBytes).  The
>>>>> second mutex lock is invoked from a function called
>>>>> ibis::fileManager::removeCleaner (which is invoked by the destructor
>>>>> of an ibis::part object).
>>>>>
>>>>> Running out memory seems to be the fundamental problem here.
>>>>> Presumably, you only need to do reordering once and your datasets are
>>>>> quite large.  I would suggest that you use only a single thread to
>>>>> reorder your data - this way all the memory will devoted to a single
>>>>> reordering operation.
>>>>>
>>>>> If you really do have a lot of memory (or each data partition is
>>>>> relatively small) and want to do the reordering with multiple threads,
>>>>> then delay the operation of freeing the ibis::part objects until you
>>>>> are done with all reordering operations.  The cleaner objects from
>>>>> each data partition will make sure each ibis::part object is taking
>>>>> only a minimal amount of memory.
>>>>>
>>>>> A note of warning, the current code only sort the numerical values,
>>>>> any strings or blobs will be left untouched.  If your datasets have
>>>>> strings or blobs, your datasets will not be coherent after calling the
>>>>> function reorder!
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On 8/29/12 4:57 AM, Petr Velan wrote:
>>>>>> Hi John,
>>>>>>
>>>>>> thank you for all the work that you put into the FastBit library, it
>>>>>> allows us to achieve great results!
>>>>>>
>>>>>> I've bumped into a little bug which might be very hard to reproduce or
>>>>>> identify. I'm using two thread to reorder and index data that are
>>>>>> already stored on disk. It was ok for a little while, but then it
>>>>>> stuck in deadlock. Here are gdb traces from both threads,
>>>>>> unfortunately without debugging symbols, so that the specific files
>>>>>> and lines are unknown.
>>>>>>
>>>>>> We are currently using the SVN version 532.
>>>>>>
>>>>>> (gdb) bt
>>>>>> #0  0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0
>>>>>> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
>>>>>> #2  0x00007f898345e257 in pthread_mutex_lock () from 
>>>>>> /lib64/libpthread.so.0
>>>>>> #3  0x00007f898271e074 in ibis::fileManager::storage::storage(unsigned
>>>>>> long) () from /usr/lib64/libfastbit.so.0
>>>>>> #4  0x00007f898271eb16 in ibis::fileManager::storage::enlarge(unsigned
>>>>>> long) () from /usr/lib64/libfastbit.so.0
>>>>>> #5  0x00007f898272214f in ibis::fileManager::roFile::doRead(char
>>>>>> const*) () from /usr/lib64/libfastbit.so.0
>>>>>> #6  0x00007f8982723b4b in ibis::fileManager::getFile(char const*,
>>>>>> ibis::fileManager::storage**, ibis::fileManager::ACCESS_PREFERENCE) ()
>>>>>> from /usr/lib64/libfastbit.so.0
>>>>>> #7  0x00007f898273406a in int ibis::fileManager::getFile<unsigned
>>>>>> short>(char const*, ibis::array_t<unsigned short>&,
>>>>>> ibis::fileManager::ACCESS_PREFERENCE) () from
>>>>>> /usr/lib64/libfastbit.so.0
>>>>>> #8  0x00007f8981f9f4a5 in ibis::column::actualMinMax(char const*,
>>>>>> ibis::bitvector const&, double&, double&) const () from
>>>>>> /usr/lib64/libfastbit.so.0
>>>>>> #9  0x00007f8981fa3546 in ibis::column::computeMinMax() () from
>>>>>> /usr/lib64/libfastbit.so.0
>>>>>> #10 0x00007f89827beae6 in
>>>>>> ibis::part::gatherSortKeys(ibis::array_t<char const*>&) () from
>>>>>> /usr/lib64/libfastbit.so.0
>>>>>> #11 0x00007f89827bfc56 in ibis::part::reorder() () from
>>>>>> /usr/lib64/libfastbit.so.0
>>>>>> #12 0x00007f8982c7e2af in reorder_index(void*) () from
>>>>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
>>>>>> #13 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
>>>>>> #14 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
>>>>>> #15 0x0000000000000000 in ?? ()
>>>>>> (gdb)
>>>>>>
>>>>>>
>>>>>> (gdb) bt
>>>>>> #0  0x00007f8983463054 in __lll_lock_wait () from /lib64/libpthread.so.0
>>>>>> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
>>>>>> #2  0x00007f898345e257 in pthread_mutex_lock () from 
>>>>>> /lib64/libpthread.so.0
>>>>>> #3  0x00007f898175a6aa in
>>>>>> ibis::util::mutexLock::mutexLock(pthread_mutex_t*, char const*) ()
>>>>>> from /usr/lib64/libfastbit.so.0
>>>>>> #4  0x00007f89827177d4 in
>>>>>> ibis::fileManager::removeCleaner(ibis::fileManager::cleaner const*) ()
>>>>>> from /usr/lib64/libfastbit.so.0
>>>>>> #5  0x00007f8981735952 in ibis::part::~part() () from 
>>>>>> /usr/lib64/libfastbit.so.0
>>>>>> #6  0x00007f8981735c29 in ibis::part::~part() () from 
>>>>>> /usr/lib64/libfastbit.so.0
>>>>>> #7  0x00007f8982c7e2cd in reorder_index(void*) () from
>>>>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
>>>>>> #8  0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
>>>>>> #9  0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
>>>>>> #10 0x0000000000000000 in ?? ()
>>>>>> (gdb)
>>>>>>
>>>>>>
>>>>>> Do you have any idea what might be going on?
>>>>>>
>>>>>> With regards,
>>>>>> Petr Velan
>>>>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to