Hi John,

The problem seems to be solved now, I see no more memory leaks. Thank
you for the quick fix.

Petr

On 20 September 2012 23:57, K. John Wu <[email protected]> wrote:
> Thanks, Petr and Michael,
>
> Please give SVN Revision 579 a try when you get the chance.  The
> problem seems to be in the function ibis::array_t<T>::push_back.  It
> lost track of the storage object (actual).  This problem should be
> fixed with the current structure of the tests.  Let us know if
> encounter any problems.
>
> John
>
>
> On 9/20/12 8:12 AM, Petr Velan wrote:
>> Hi John,
>>
>> I just updated to latest SVN revision and tried the reorder function
>> on read-only partition. The reordering is now performed exactly as
>> described in documentation, so the problem no longer exists.
>>
>> However, I run into a problem with some memory leaks in the new
>> revision. My previous version was 3.0.10 (SVN 532) and there were no
>> leaks in my usecase. Now I'm loosing some memory. To simulate the
>> memory loss, you only need to create a partition in a simple program:
>> #include <fastbit/ibis.h>
>> #include <iostream>
>>
>> int main(int argc, char *argv[]) {
>>       ibis::gVerbose = 10;
>>       ibis::part part(argv[1], false);
>>       return 0;
>> }
>>
>> Attached below is the program log with verbose set to 10. You might
>> notice that the leak is 40 bytes long, and that it is caused by
>> fileManager::storage class created in array_t constructor not being
>> deleted. In my case, the first constucted array_t's actual is probably
>> not removed. I've managed to trace the problem to "ibis::bitvector
>> amask" of ibis::part, but I cannot seem to find the problem. Most
>> likely the fileManager is replaced somewhere without proper
>> deallocation, but I cannot find it.
>>
>> Yours sincerely,
>> Petr Velan
>>
>> velan@wall:~/Documents/devel/tmp/test> valgrind --leak-check=full
>> ./test2 ../../data/000000000001/1/
>> ==25287== Memcheck, a memory error detector
>> ==25287== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
>> ==25287== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
>> ==25287== Command: ./test2 ../../data/000000000001/1/
>> ==25287==
>>
>> FastBit ibis1.3.2.6
>> Log messages started on Thu Sep 20 17:05:57 2012
>> fileManager::storage(0x775b8e0, 0) initialization completed
>> array_t<j> constructed at 0x7fefffb20 with actual=0x775b8e0, m_begin=0
>> and actual->size()=0
>> bitvector (0x7fefffb10) constructed with m_vec at 0x7fefffb20
>> fileManager::ctor found the physical memory size to be 4083470336 bytes
>> fileManager initialization complete -- maxBytes=2041735168, maxOpenFiles=768
>> part::readMetaData -- opened ../../data/000000000001/1/-part.txt for reading
>> Name = "1"
>>
>> Description = "Generated by ipfixcol fastbit plugin"
>>
>> Number_of_columns = 11
>>
>> Number_of_rows = 499992
>>
>> Timestamp = 1329386614
>>
>> State = 1
>>
>> END HEADER
>>
>> fileManager::storage(0x775f3e0, 0) initialization completed
>> array_t<j> constructed at 0x775f2e0 with actual=0x775f3e0, m_begin=0
>> and actual->size()=0
>> bitvector (0x775f2d0) constructed with m_vec at 0x775f2e0
>> read info about column 1.e0id1 (ULONG)
>> part::readMetaData -- got column e0id1 from 
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x77605a0, 0) initialization completed
>> array_t<j> constructed at 0x77604a0 with actual=0x77605a0, m_begin=0
>> and actual->size()=0
>> bitvector (0x7760490) constructed with m_vec at 0x77604a0
>> read info about column 1.e0id11 (USHORT)
>> part::readMetaData -- got column e0id11 from 
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x7761770, 0) initialization completed
>> array_t<j> constructed at 0x7761670 with actual=0x7761770, m_begin=0
>> and actual->size()=0
>> bitvector (0x7761660) constructed with m_vec at 0x7761670
>> read info about column 1.e0id12 (UINT)
>> part::readMetaData -- got column e0id12 from 
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x7762930, 0) initialization completed
>> array_t<j> constructed at 0x7762830 with actual=0x7762930, m_begin=0
>> and actual->size()=0
>> bitvector (0x7762820) constructed with m_vec at 0x7762830
>> read info about column 1.e0id152 (ULONG)
>> part::readMetaData -- got column e0id152 from
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x7763b00, 0) initialization completed
>> array_t<j> constructed at 0x7763a00 with actual=0x7763b00, m_begin=0
>> and actual->size()=0
>> bitvector (0x77639f0) constructed with m_vec at 0x7763a00
>> read info about column 1.e0id153 (ULONG)
>> part::readMetaData -- got column e0id153 from
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x7764cd0, 0) initialization completed
>> array_t<j> constructed at 0x7764bd0 with actual=0x7764cd0, m_begin=0
>> and actual->size()=0
>> bitvector (0x7764bc0) constructed with m_vec at 0x7764bd0
>> read info about column 1.e0id2 (ULONG)
>> part::readMetaData -- got column e0id2 from 
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x7765e90, 0) initialization completed
>> array_t<j> constructed at 0x7765d90 with actual=0x7765e90, m_begin=0
>> and actual->size()=0
>> bitvector (0x7765d80) constructed with m_vec at 0x7765d90
>> read info about column 1.e0id4 (UBYTE)
>> part::readMetaData -- got column e0id4 from 
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x7767050, 0) initialization completed
>> array_t<j> constructed at 0x7766f50 with actual=0x7767050, m_begin=0
>> and actual->size()=0
>> bitvector (0x7766f40) constructed with m_vec at 0x7766f50
>> read info about column 1.e0id5 (UBYTE)
>> part::readMetaData -- got column e0id5 from 
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x7768210, 0) initialization completed
>> array_t<j> constructed at 0x7768110 with actual=0x7768210, m_begin=0
>> and actual->size()=0
>> bitvector (0x7768100) constructed with m_vec at 0x7768110
>> read info about column 1.e0id6 (UBYTE)
>> part::readMetaData -- got column e0id6 from 
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x77693d0, 0) initialization completed
>> array_t<j> constructed at 0x77692d0 with actual=0x77693d0, m_begin=0
>> and actual->size()=0
>> bitvector (0x77692c0) constructed with m_vec at 0x77692d0
>> read info about column 1.e0id7 (USHORT)
>> part::readMetaData -- got column e0id7 from 
>> ../../data/000000000001/1/-part.txt
>> fileManager::storage(0x776a590, 0) initialization completed
>> array_t<j> constructed at 0x776a490 with actual=0x776a590, m_begin=0
>> and actual->size()=0
>> bitvector (0x776a480) constructed with m_vec at 0x776a490
>> read info about column 1.e0id8 (UINT)
>> part::readMetaData -- got column e0id8 from 
>> ../../data/000000000001/1/-part.txt
>> part[1]::gainReadAccess -- pthread_rwlock_rdlock(0x7fefffbb8) for readRIDs
>> fileManager::storage(0x776b9a0, 0) initialization completed
>> array_t<N4ibis5rid_tE> constructed at 0x776b940 with actual=0x776b9a0,
>> m_begin=0 and actual->size()=0
>> part[1]::readRIDs -- the file manager failed to read file
>> "../../data/000000000001/1/-rids".  There is no RIDs.
>> part[1]::releaseAccess -- pthread_rwlock_unlock(0x7fefffbb8) for readRIDs
>> Warning -- failed to read the content of
>> ../../data/000000000001/1/-part.msk, fileManager::getFile returned
>> -101
>> fileManager::storage(0x776d040, 0x776d0b0) added 12 bytes to increase
>> totalBytes to 12
>> fileManager::storage(0x776d040, 0x776d0b0) initialization completed
>> with 12 elements
>> fileManager::flushFile will do nothing because
>> "../../data/000000000001/1/-part.msk" is not tracked by the file
>> manager
>> part::init -- mask for partition 1 has 499992 set bits out of 499992
>> Constructed a part named 1
>> activeDir = "../../data/000000000001/1"
>> part: 1 (Generated by ipfixcol fastbit plugin) with 499992 rows, 11 columns
>> Column list:
>> e0id1:  (ULONG) [28, 1.44872e+09]
>> e0id11:  (USHORT) [0, 65535]
>> e0id12:  (UINT) [1.0466e+08, 4.02607e+09]
>> e0id152:  (ULONG) [1.26981e+12, 1.26982e+12]
>> e0id153:  (ULONG) [1.26981e+12, 1.26982e+12]
>> e0id2:  (ULONG) [1, 1.3616e+06]
>> e0id4:  (UBYTE) [1, 41]
>> e0id5:  (UBYTE) [0, 0]
>> e0id6:  (UBYTE) [0, 31]
>> e0id7:  (USHORT) [0, 65535]
>> e0id8:  (UINT) [1.0466e+08, 3.75763e+09]
>>
>> part[1]::gainWriteAccess -- pthread_rwlock_wrlock(0x7fefffbb8) for ~part
>> clearing data partition 1
>> column[1.e0id1]::writeLock -- pthread_rwlock_wrlock(0x775f340) for ~column
>> clearing column 1.e0id1
>> column[1.e0id1]::writeLock -- pthread_rwlock_unlock(0x775f340) for ~column
>> bitvector (0x775f2d0) clear the content of bitvector with m_vec at 0x775f2e0
>> array_t<j>::freeMemory this=0x775f2e0 actual=0x775f3e0 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x775f3e0, 0) cleared
>> column[1.e0id11]::writeLock -- pthread_rwlock_wrlock(0x7760500) for ~column
>> clearing column 1.e0id11
>> column[1.e0id11]::writeLock -- pthread_rwlock_unlock(0x7760500) for ~column
>> bitvector (0x7760490) clear the content of bitvector with m_vec at 0x77604a0
>> array_t<j>::freeMemory this=0x77604a0 actual=0x77605a0 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x77605a0, 0) cleared
>> column[1.e0id12]::writeLock -- pthread_rwlock_wrlock(0x77616d0) for ~column
>> clearing column 1.e0id12
>> column[1.e0id12]::writeLock -- pthread_rwlock_unlock(0x77616d0) for ~column
>> bitvector (0x7761660) clear the content of bitvector with m_vec at 0x7761670
>> array_t<j>::freeMemory this=0x7761670 actual=0x7761770 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x7761770, 0) cleared
>> column[1.e0id152]::writeLock -- pthread_rwlock_wrlock(0x7762890) for ~column
>> clearing column 1.e0id152
>> column[1.e0id152]::writeLock -- pthread_rwlock_unlock(0x7762890) for ~column
>> bitvector (0x7762820) clear the content of bitvector with m_vec at 0x7762830
>> array_t<j>::freeMemory this=0x7762830 actual=0x7762930 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x7762930, 0) cleared
>> column[1.e0id153]::writeLock -- pthread_rwlock_wrlock(0x7763a60) for ~column
>> clearing column 1.e0id153
>> column[1.e0id153]::writeLock -- pthread_rwlock_unlock(0x7763a60) for ~column
>> bitvector (0x77639f0) clear the content of bitvector with m_vec at 0x7763a00
>> array_t<j>::freeMemory this=0x7763a00 actual=0x7763b00 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x7763b00, 0) cleared
>> column[1.e0id2]::writeLock -- pthread_rwlock_wrlock(0x7764c30) for ~column
>> clearing column 1.e0id2
>> column[1.e0id2]::writeLock -- pthread_rwlock_unlock(0x7764c30) for ~column
>> bitvector (0x7764bc0) clear the content of bitvector with m_vec at 0x7764bd0
>> array_t<j>::freeMemory this=0x7764bd0 actual=0x7764cd0 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x7764cd0, 0) cleared
>> column[1.e0id4]::writeLock -- pthread_rwlock_wrlock(0x7765df0) for ~column
>> clearing column 1.e0id4
>> column[1.e0id4]::writeLock -- pthread_rwlock_unlock(0x7765df0) for ~column
>> bitvector (0x7765d80) clear the content of bitvector with m_vec at 0x7765d90
>> array_t<j>::freeMemory this=0x7765d90 actual=0x7765e90 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x7765e90, 0) cleared
>> column[1.e0id5]::writeLock -- pthread_rwlock_wrlock(0x7766fb0) for ~column
>> clearing column 1.e0id5
>> column[1.e0id5]::writeLock -- pthread_rwlock_unlock(0x7766fb0) for ~column
>> bitvector (0x7766f40) clear the content of bitvector with m_vec at 0x7766f50
>> array_t<j>::freeMemory this=0x7766f50 actual=0x7767050 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x7767050, 0) cleared
>> column[1.e0id6]::writeLock -- pthread_rwlock_wrlock(0x7768170) for ~column
>> clearing column 1.e0id6
>> column[1.e0id6]::writeLock -- pthread_rwlock_unlock(0x7768170) for ~column
>> bitvector (0x7768100) clear the content of bitvector with m_vec at 0x7768110
>> array_t<j>::freeMemory this=0x7768110 actual=0x7768210 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x7768210, 0) cleared
>> column[1.e0id7]::writeLock -- pthread_rwlock_wrlock(0x7769330) for ~column
>> clearing column 1.e0id7
>> column[1.e0id7]::writeLock -- pthread_rwlock_unlock(0x7769330) for ~column
>> bitvector (0x77692c0) clear the content of bitvector with m_vec at 0x77692d0
>> array_t<j>::freeMemory this=0x77692d0 actual=0x77693d0 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x77693d0, 0) cleared
>> column[1.e0id8]::writeLock -- pthread_rwlock_wrlock(0x776a4f0) for ~column
>> clearing column 1.e0id8
>> column[1.e0id8]::writeLock -- pthread_rwlock_unlock(0x776a4f0) for ~column
>> bitvector (0x776a480) clear the content of bitvector with m_vec at 0x776a490
>> array_t<j>::freeMemory this=0x776a490 actual=0x776a590 and m_begin=0
>> (active references: 0, past references: 1)
>> fileManager::storage(0x776a590, 0) cleared
>> part[1]::releaseAccess -- pthread_rwlock_unlock(0x7fefffbb8) for ~part
>> array_t<N4ibis5rid_tE>::freeMemory this=0x776b940 actual=0x776b9a0 and
>> m_begin=0 (active references: 0, past references: 1)
>> fileManager::storage(0x776b9a0, 0) cleared
>> bitvector (0x7fefffb10) clear the content of bitvector with m_vec at 
>> 0x7fefffb20
>> array_t<j>::freeMemory this=0x7fefffb20 actual=0x776d040 and
>> m_begin=0x776d0b0 (active references: 0, past references: 1)
>> fileManager::storage(0x776d040, 0x776d0b0) removed 12 bytes to
>> decrease totalBytes to 0
>> fileManager::storage(0x776d040, 0x776d0b0) cleared
>> fileManager::clear has nothing to do
>> fileManager decommissioned
>> ==25287==
>> ==25287== HEAP SUMMARY:
>> ==25287==     in use at exit: 40 bytes in 1 blocks
>> ==25287==   total heap usage: 466 allocs, 465 frees, 119,596 bytes allocated
>> ==25287==
>> ==25287== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
>> ==25287==    at 0x4C292C7: operator new(unsigned long) (in
>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==25287==    by 0x5B2F832: ibis::array_t<unsigned int>::array_t()
>> (array_t.cpp:36)
>> ==25287==    by 0x604C5A9: ibis::bitvector::bitvector() (bitvector.cpp:33)
>> ==25287==    by 0x517990B: ibis::part::part(char const*, bool) (part.cpp:262)
>> ==25287==    by 0x40306B: main (test2.cpp:9)
>> ==25287==
>> ==25287== LEAK SUMMARY:
>> ==25287==    definitely lost: 40 bytes in 1 blocks
>> ==25287==    indirectly lost: 0 bytes in 0 blocks
>> ==25287==      possibly lost: 0 bytes in 0 blocks
>> ==25287==    still reachable: 0 bytes in 0 blocks
>> ==25287==         suppressed: 0 bytes in 0 blocks
>> ==25287==
>> ==25287== For counts of detected and suppressed errors, rerun with: -v
>> ==25287== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6)
>>
>>
>> On 20 September 2012 07:45, K. John Wu <[email protected]> wrote:
>>> Hi, Petr,
>>>
>>> I just updated examples/thula.cpp to exercise the option of reordering
>>> read-only data partitions.  It seems to work in a small test I tried.
>>>  The new code is in SVN as revision 577.
>>>
>>> BTW, if yo invoke thula with a data directory and a orderby clause,
>>> but without a where clause or select clause, then it will perform the
>>> reordering on the data partitions in the given directory.
>>>
>>> At least on the surface, the following command line (run in tests
>>> directory after 'make check') seems to work fine.
>>>
>>> ../example/thula -d tmp/t2 -o c -v
>>>
>>>
>>> Would you mind give it a try?
>>>
>>> Thanks.
>>>
>>> John
>>>
>>>
>>> On 9/19/12 2:41 AM, Petr Velan wrote:
>>>> Hi John,
>>>>
>>>> I managed to get the test program running, but I cannot seem to be
>>>> able to reproduce the deadlock. I'll look further into it when I have
>>>> some spare time.
>>>>
>>>> One thing I noticed using the test program, the reorder does not work
>>>> on read-only partition, but the documentation says that it should,
>>>> since it does not "change" the data. I'm not running the latest SVN
>>>> version, so maybe it is somehow resolved and consistent now. I just
>>>> wanted to let you know.
>>>>
>>>> Thanks,
>>>> Petr
>>>>
>>>> On 4 September 2012 07:30, Kesheng Wu <[email protected]> wrote:
>>>>> Hi, Petr,
>>>>>
>>>>> Attached is a modification of the file tests/setqgen.cpp to mimic your
>>>>> use case.  So far, it seems to produce exactly the same output set (as
>>>>> a whole, not in the individual partitions) as produced by
>>>>> tests/setqgen.cpp.  It works OK on my laptop.  Would you mind take a
>>>>> look and see if you can get it to behave more like what your program
>>>>> does?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On Thu, Aug 30, 2012 at 11:45 PM, Petr Velan <[email protected]> wrote:
>>>>>> Hi John,
>>>>>>
>>>>>> The memory management in FastBit is good for batch mode of operation,
>>>>>> however I need to reduce the memory footprint, since there might be
>>>>>> some other operations that might require large amount of memory for
>>>>>> the short time, so that the FastBit cannot have it allocated all the
>>>>>> time.
>>>>>>
>>>>>> I know that there is a limit that allows to use maximum of half of
>>>>>> available memory and that it can be changed. I think it would be a
>>>>>> good thing to have two limits. One to set maximum that can be used and
>>>>>> other that would trigger the unload function. The result would be that
>>>>>> I would set FastBit to have 0.5GB at ready and allow it to expand to
>>>>>> 10GB. So when needed, the fastbit would use up to 10GB of memory, but
>>>>>> after that it would would free it and keep only 0.5GB for further use.
>>>>>> The default could still be to have both limits at half of available
>>>>>> memory. What do you think?
>>>>>>
>>>>>> We are currently trying to manually call
>>>>>> ibis::fileManager::instance().flushDir() to see if it helps to keep
>>>>>> the memory down, but I believe that the solution I described earlier
>>>>>> is much more generic.
>>>>>>
>>>>>> Unfortunately, I do not have any simple code to reproduce the
>>>>>> deadlock. I'll try to look into it, maybe compile FastBit with
>>>>>> debugging symbols to help us better understand what is  going on.
>>>>>>
>>>>>> Petr
>>>>>>
>>>>>> On 30 August 2012 20:29, K. John Wu <[email protected]> wrote:
>>>>>>> Hi, Petr,
>>>>>>>
>>>>>>> Thanks for clarifying the use case.  Looks like you can not wait for
>>>>>>> everything to be done before releasing the ibis::part objects.
>>>>>>> Regarding the memory usage, FastBit does lazy deletions - as long as
>>>>>>> no one needs new memory, the existing content read from files will be
>>>>>>> kept in memory.  The default maximum memory to be used is a half of
>>>>>>> the physical memory - which explains what you've observed.  Once
>>>>>>> reaching that limit, ibis::fileManager::unload will be called to
>>>>>>> remove the content of files that are no in active use.  In your case,
>>>>>>> it sounds like there will be a lot of old files to be removed from 
>>>>>>> memory.
>>>>>>>
>>>>>>> Since there is no clear indication which thread is holding on to the
>>>>>>> mutex lock, we might need to create a multithreaded data generator
>>>>>>> that can mimic your data ingestion process.  If you have simple one
>>>>>>> that I can borrow, I would greatly appreciate it.
>>>>>>>
>>>>>>> Most likely, another copy of ibis::fileManager::getFile is holding on
>>>>>>> to the ibis::fileManager::mutex.  However, logically, that is not
>>>>>>> possible because that thread can only be waiting on a conditional
>>>>>>> variable in which case it should have yield the mutex lock already.
>>>>>>> Anyway, something gnarly is going on here..
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 8/29/12 10:54 PM, Petr Velan wrote:
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> I still do not understand why there is a deadlock, or why is the
>>>>>>>> access to different partitions managed by same mutex lock.
>>>>>>>>
>>>>>>>> Our use case is this:
>>>>>>>> We have a process that collects data from network and stores them to
>>>>>>>> fastbit partitions. Each partition contains 5 minutes of data,
>>>>>>>> approximately 300-400MB. After 5 minutes expire, new thread is
>>>>>>>> launched that creates ibis::part, runs reorder, deletes the part,
>>>>>>>> creates ibis::table which is used to create indexes and then deletes
>>>>>>>> the table. After that the thread ends.
>>>>>>>>
>>>>>>>> Since there is data from multiple sources, there are multiple threads
>>>>>>>> that store the data and reorder/index it.
>>>>>>>>
>>>>>>>> What is bothering me are two things:
>>>>>>>> The deadlock, since the mutex should only synchronize, I wonder who
>>>>>>>> really holds the lock when both threads are waiting for it.
>>>>>>>> Second is that the memory used by the process constantly grows. After
>>>>>>>> the parts and tables are deleted, I would expect the memory to be
>>>>>>>> released as well, since for next 5 minutes, it will not be needed.
>>>>>>>> Unfortunately, FastBit does not free the memory until it reaches 50%
>>>>>>>> of total memory, which in our case is 6GB. That is kind of
>>>>>>>> unfortunate, since what it should really need is about 1GB of memory
>>>>>>>> for reorder in the worst case and then the memory should be free to
>>>>>>>> use by other processes. Is there any way to achieve this? The memory
>>>>>>>> is consumed even without the reordering, only when building indexes.
>>>>>>>>
>>>>>>>> Thank you for the warning about strings, we plan to use them in
>>>>>>>> future, so we will have to without the reorder in that case.
>>>>>>>>
>>>>>>>> Petr
>>>>>>>>
>>>>>>>> On 29 August 2012 18:28, K. John Wu <[email protected]> wrote:
>>>>>>>>> Hi, Petr,
>>>>>>>>>
>>>>>>>>> From the stack traces, look like one thread is trying to free a data
>>>>>>>>> partition object while another one is trying to reorder the rows of
>>>>>>>>> presumably another data partition.  The first mutex lock is invoked
>>>>>>>>> from the constructor of a storage object (ibis::fileManager::storage).
>>>>>>>>>  This is invoked because the amount of data in memory (tracked by the
>>>>>>>>> file manager) is close to the prescribed maximum (maxBytes).  The
>>>>>>>>> second mutex lock is invoked from a function called
>>>>>>>>> ibis::fileManager::removeCleaner (which is invoked by the destructor
>>>>>>>>> of an ibis::part object).
>>>>>>>>>
>>>>>>>>> Running out memory seems to be the fundamental problem here.
>>>>>>>>> Presumably, you only need to do reordering once and your datasets are
>>>>>>>>> quite large.  I would suggest that you use only a single thread to
>>>>>>>>> reorder your data - this way all the memory will devoted to a single
>>>>>>>>> reordering operation.
>>>>>>>>>
>>>>>>>>> If you really do have a lot of memory (or each data partition is
>>>>>>>>> relatively small) and want to do the reordering with multiple threads,
>>>>>>>>> then delay the operation of freeing the ibis::part objects until you
>>>>>>>>> are done with all reordering operations.  The cleaner objects from
>>>>>>>>> each data partition will make sure each ibis::part object is taking
>>>>>>>>> only a minimal amount of memory.
>>>>>>>>>
>>>>>>>>> A note of warning, the current code only sort the numerical values,
>>>>>>>>> any strings or blobs will be left untouched.  If your datasets have
>>>>>>>>> strings or blobs, your datasets will not be coherent after calling the
>>>>>>>>> function reorder!
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/29/12 4:57 AM, Petr Velan wrote:
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> thank you for all the work that you put into the FastBit library, it
>>>>>>>>>> allows us to achieve great results!
>>>>>>>>>>
>>>>>>>>>> I've bumped into a little bug which might be very hard to reproduce 
>>>>>>>>>> or
>>>>>>>>>> identify. I'm using two thread to reorder and index data that are
>>>>>>>>>> already stored on disk. It was ok for a little while, but then it
>>>>>>>>>> stuck in deadlock. Here are gdb traces from both threads,
>>>>>>>>>> unfortunately without debugging symbols, so that the specific files
>>>>>>>>>> and lines are unknown.
>>>>>>>>>>
>>>>>>>>>> We are currently using the SVN version 532.
>>>>>>>>>>
>>>>>>>>>> (gdb) bt
>>>>>>>>>> #0  0x00007f8983463054 in __lll_lock_wait () from 
>>>>>>>>>> /lib64/libpthread.so.0
>>>>>>>>>> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
>>>>>>>>>> #2  0x00007f898345e257 in pthread_mutex_lock () from 
>>>>>>>>>> /lib64/libpthread.so.0
>>>>>>>>>> #3  0x00007f898271e074 in 
>>>>>>>>>> ibis::fileManager::storage::storage(unsigned
>>>>>>>>>> long) () from /usr/lib64/libfastbit.so.0
>>>>>>>>>> #4  0x00007f898271eb16 in 
>>>>>>>>>> ibis::fileManager::storage::enlarge(unsigned
>>>>>>>>>> long) () from /usr/lib64/libfastbit.so.0
>>>>>>>>>> #5  0x00007f898272214f in ibis::fileManager::roFile::doRead(char
>>>>>>>>>> const*) () from /usr/lib64/libfastbit.so.0
>>>>>>>>>> #6  0x00007f8982723b4b in ibis::fileManager::getFile(char const*,
>>>>>>>>>> ibis::fileManager::storage**, ibis::fileManager::ACCESS_PREFERENCE) 
>>>>>>>>>> ()
>>>>>>>>>> from /usr/lib64/libfastbit.so.0
>>>>>>>>>> #7  0x00007f898273406a in int ibis::fileManager::getFile<unsigned
>>>>>>>>>> short>(char const*, ibis::array_t<unsigned short>&,
>>>>>>>>>> ibis::fileManager::ACCESS_PREFERENCE) () from
>>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>>> #8  0x00007f8981f9f4a5 in ibis::column::actualMinMax(char const*,
>>>>>>>>>> ibis::bitvector const&, double&, double&) const () from
>>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>>> #9  0x00007f8981fa3546 in ibis::column::computeMinMax() () from
>>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>>> #10 0x00007f89827beae6 in
>>>>>>>>>> ibis::part::gatherSortKeys(ibis::array_t<char const*>&) () from
>>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>>> #11 0x00007f89827bfc56 in ibis::part::reorder() () from
>>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>>> #12 0x00007f8982c7e2af in reorder_index(void*) () from
>>>>>>>>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
>>>>>>>>>> #13 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
>>>>>>>>>> #14 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
>>>>>>>>>> #15 0x0000000000000000 in ?? ()
>>>>>>>>>> (gdb)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (gdb) bt
>>>>>>>>>> #0  0x00007f8983463054 in __lll_lock_wait () from 
>>>>>>>>>> /lib64/libpthread.so.0
>>>>>>>>>> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
>>>>>>>>>> #2  0x00007f898345e257 in pthread_mutex_lock () from 
>>>>>>>>>> /lib64/libpthread.so.0
>>>>>>>>>> #3  0x00007f898175a6aa in
>>>>>>>>>> ibis::util::mutexLock::mutexLock(pthread_mutex_t*, char const*) ()
>>>>>>>>>> from /usr/lib64/libfastbit.so.0
>>>>>>>>>> #4  0x00007f89827177d4 in
>>>>>>>>>> ibis::fileManager::removeCleaner(ibis::fileManager::cleaner const*) 
>>>>>>>>>> ()
>>>>>>>>>> from /usr/lib64/libfastbit.so.0
>>>>>>>>>> #5  0x00007f8981735952 in ibis::part::~part() () from 
>>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>>> #6  0x00007f8981735c29 in ibis::part::~part() () from 
>>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>>> #7  0x00007f8982c7e2cd in reorder_index(void*) () from
>>>>>>>>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
>>>>>>>>>> #8  0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
>>>>>>>>>> #9  0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
>>>>>>>>>> #10 0x0000000000000000 in ?? ()
>>>>>>>>>> (gdb)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do you have any idea what might be going on?
>>>>>>>>>>
>>>>>>>>>> With regards,
>>>>>>>>>> Petr Velan
>>>>>>>>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to