Thanks, Petr and Michael,

Please give SVN Revision 579 a try when you get the chance.  The
problem seems to be in the function ibis::array_t<T>::push_back.  It
lost track of the storage object (actual).  This problem should be
fixed with the current structure of the tests.  Let us know if
encounter any problems.

John


On 9/20/12 8:12 AM, Petr Velan wrote:
> Hi John,
> 
> I just updated to latest SVN revision and tried the reorder function
> on read-only partition. The reordering is now performed exactly as
> described in documentation, so the problem no longer exists.
> 
> However, I run into a problem with some memory leaks in the new
> revision. My previous version was 3.0.10 (SVN 532) and there were no
> leaks in my usecase. Now I'm loosing some memory. To simulate the
> memory loss, you only need to create a partition in a simple program:
> #include <fastbit/ibis.h>
> #include <iostream>
> 
> int main(int argc, char *argv[]) {
>       ibis::gVerbose = 10;
>       ibis::part part(argv[1], false);
>       return 0;
> }
> 
> Attached below is the program log with verbose set to 10. You might
> notice that the leak is 40 bytes long, and that it is caused by
> fileManager::storage class created in array_t constructor not being
> deleted. In my case, the first constucted array_t's actual is probably
> not removed. I've managed to trace the problem to "ibis::bitvector
> amask" of ibis::part, but I cannot seem to find the problem. Most
> likely the fileManager is replaced somewhere without proper
> deallocation, but I cannot find it.
> 
> Yours sincerely,
> Petr Velan
> 
> velan@wall:~/Documents/devel/tmp/test> valgrind --leak-check=full
> ./test2 ../../data/000000000001/1/
> ==25287== Memcheck, a memory error detector
> ==25287== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
> ==25287== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
> ==25287== Command: ./test2 ../../data/000000000001/1/
> ==25287==
> 
> FastBit ibis1.3.2.6
> Log messages started on Thu Sep 20 17:05:57 2012
> fileManager::storage(0x775b8e0, 0) initialization completed
> array_t<j> constructed at 0x7fefffb20 with actual=0x775b8e0, m_begin=0
> and actual->size()=0
> bitvector (0x7fefffb10) constructed with m_vec at 0x7fefffb20
> fileManager::ctor found the physical memory size to be 4083470336 bytes
> fileManager initialization complete -- maxBytes=2041735168, maxOpenFiles=768
> part::readMetaData -- opened ../../data/000000000001/1/-part.txt for reading
> Name = "1"
> 
> Description = "Generated by ipfixcol fastbit plugin"
> 
> Number_of_columns = 11
> 
> Number_of_rows = 499992
> 
> Timestamp = 1329386614
> 
> State = 1
> 
> END HEADER
> 
> fileManager::storage(0x775f3e0, 0) initialization completed
> array_t<j> constructed at 0x775f2e0 with actual=0x775f3e0, m_begin=0
> and actual->size()=0
> bitvector (0x775f2d0) constructed with m_vec at 0x775f2e0
> read info about column 1.e0id1 (ULONG)
> part::readMetaData -- got column e0id1 from 
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x77605a0, 0) initialization completed
> array_t<j> constructed at 0x77604a0 with actual=0x77605a0, m_begin=0
> and actual->size()=0
> bitvector (0x7760490) constructed with m_vec at 0x77604a0
> read info about column 1.e0id11 (USHORT)
> part::readMetaData -- got column e0id11 from 
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x7761770, 0) initialization completed
> array_t<j> constructed at 0x7761670 with actual=0x7761770, m_begin=0
> and actual->size()=0
> bitvector (0x7761660) constructed with m_vec at 0x7761670
> read info about column 1.e0id12 (UINT)
> part::readMetaData -- got column e0id12 from 
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x7762930, 0) initialization completed
> array_t<j> constructed at 0x7762830 with actual=0x7762930, m_begin=0
> and actual->size()=0
> bitvector (0x7762820) constructed with m_vec at 0x7762830
> read info about column 1.e0id152 (ULONG)
> part::readMetaData -- got column e0id152 from
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x7763b00, 0) initialization completed
> array_t<j> constructed at 0x7763a00 with actual=0x7763b00, m_begin=0
> and actual->size()=0
> bitvector (0x77639f0) constructed with m_vec at 0x7763a00
> read info about column 1.e0id153 (ULONG)
> part::readMetaData -- got column e0id153 from
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x7764cd0, 0) initialization completed
> array_t<j> constructed at 0x7764bd0 with actual=0x7764cd0, m_begin=0
> and actual->size()=0
> bitvector (0x7764bc0) constructed with m_vec at 0x7764bd0
> read info about column 1.e0id2 (ULONG)
> part::readMetaData -- got column e0id2 from 
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x7765e90, 0) initialization completed
> array_t<j> constructed at 0x7765d90 with actual=0x7765e90, m_begin=0
> and actual->size()=0
> bitvector (0x7765d80) constructed with m_vec at 0x7765d90
> read info about column 1.e0id4 (UBYTE)
> part::readMetaData -- got column e0id4 from 
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x7767050, 0) initialization completed
> array_t<j> constructed at 0x7766f50 with actual=0x7767050, m_begin=0
> and actual->size()=0
> bitvector (0x7766f40) constructed with m_vec at 0x7766f50
> read info about column 1.e0id5 (UBYTE)
> part::readMetaData -- got column e0id5 from 
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x7768210, 0) initialization completed
> array_t<j> constructed at 0x7768110 with actual=0x7768210, m_begin=0
> and actual->size()=0
> bitvector (0x7768100) constructed with m_vec at 0x7768110
> read info about column 1.e0id6 (UBYTE)
> part::readMetaData -- got column e0id6 from 
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x77693d0, 0) initialization completed
> array_t<j> constructed at 0x77692d0 with actual=0x77693d0, m_begin=0
> and actual->size()=0
> bitvector (0x77692c0) constructed with m_vec at 0x77692d0
> read info about column 1.e0id7 (USHORT)
> part::readMetaData -- got column e0id7 from 
> ../../data/000000000001/1/-part.txt
> fileManager::storage(0x776a590, 0) initialization completed
> array_t<j> constructed at 0x776a490 with actual=0x776a590, m_begin=0
> and actual->size()=0
> bitvector (0x776a480) constructed with m_vec at 0x776a490
> read info about column 1.e0id8 (UINT)
> part::readMetaData -- got column e0id8 from 
> ../../data/000000000001/1/-part.txt
> part[1]::gainReadAccess -- pthread_rwlock_rdlock(0x7fefffbb8) for readRIDs
> fileManager::storage(0x776b9a0, 0) initialization completed
> array_t<N4ibis5rid_tE> constructed at 0x776b940 with actual=0x776b9a0,
> m_begin=0 and actual->size()=0
> part[1]::readRIDs -- the file manager failed to read file
> "../../data/000000000001/1/-rids".  There is no RIDs.
> part[1]::releaseAccess -- pthread_rwlock_unlock(0x7fefffbb8) for readRIDs
> Warning -- failed to read the content of
> ../../data/000000000001/1/-part.msk, fileManager::getFile returned
> -101
> fileManager::storage(0x776d040, 0x776d0b0) added 12 bytes to increase
> totalBytes to 12
> fileManager::storage(0x776d040, 0x776d0b0) initialization completed
> with 12 elements
> fileManager::flushFile will do nothing because
> "../../data/000000000001/1/-part.msk" is not tracked by the file
> manager
> part::init -- mask for partition 1 has 499992 set bits out of 499992
> Constructed a part named 1
> activeDir = "../../data/000000000001/1"
> part: 1 (Generated by ipfixcol fastbit plugin) with 499992 rows, 11 columns
> Column list:
> e0id1:  (ULONG) [28, 1.44872e+09]
> e0id11:  (USHORT) [0, 65535]
> e0id12:  (UINT) [1.0466e+08, 4.02607e+09]
> e0id152:  (ULONG) [1.26981e+12, 1.26982e+12]
> e0id153:  (ULONG) [1.26981e+12, 1.26982e+12]
> e0id2:  (ULONG) [1, 1.3616e+06]
> e0id4:  (UBYTE) [1, 41]
> e0id5:  (UBYTE) [0, 0]
> e0id6:  (UBYTE) [0, 31]
> e0id7:  (USHORT) [0, 65535]
> e0id8:  (UINT) [1.0466e+08, 3.75763e+09]
> 
> part[1]::gainWriteAccess -- pthread_rwlock_wrlock(0x7fefffbb8) for ~part
> clearing data partition 1
> column[1.e0id1]::writeLock -- pthread_rwlock_wrlock(0x775f340) for ~column
> clearing column 1.e0id1
> column[1.e0id1]::writeLock -- pthread_rwlock_unlock(0x775f340) for ~column
> bitvector (0x775f2d0) clear the content of bitvector with m_vec at 0x775f2e0
> array_t<j>::freeMemory this=0x775f2e0 actual=0x775f3e0 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x775f3e0, 0) cleared
> column[1.e0id11]::writeLock -- pthread_rwlock_wrlock(0x7760500) for ~column
> clearing column 1.e0id11
> column[1.e0id11]::writeLock -- pthread_rwlock_unlock(0x7760500) for ~column
> bitvector (0x7760490) clear the content of bitvector with m_vec at 0x77604a0
> array_t<j>::freeMemory this=0x77604a0 actual=0x77605a0 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x77605a0, 0) cleared
> column[1.e0id12]::writeLock -- pthread_rwlock_wrlock(0x77616d0) for ~column
> clearing column 1.e0id12
> column[1.e0id12]::writeLock -- pthread_rwlock_unlock(0x77616d0) for ~column
> bitvector (0x7761660) clear the content of bitvector with m_vec at 0x7761670
> array_t<j>::freeMemory this=0x7761670 actual=0x7761770 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x7761770, 0) cleared
> column[1.e0id152]::writeLock -- pthread_rwlock_wrlock(0x7762890) for ~column
> clearing column 1.e0id152
> column[1.e0id152]::writeLock -- pthread_rwlock_unlock(0x7762890) for ~column
> bitvector (0x7762820) clear the content of bitvector with m_vec at 0x7762830
> array_t<j>::freeMemory this=0x7762830 actual=0x7762930 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x7762930, 0) cleared
> column[1.e0id153]::writeLock -- pthread_rwlock_wrlock(0x7763a60) for ~column
> clearing column 1.e0id153
> column[1.e0id153]::writeLock -- pthread_rwlock_unlock(0x7763a60) for ~column
> bitvector (0x77639f0) clear the content of bitvector with m_vec at 0x7763a00
> array_t<j>::freeMemory this=0x7763a00 actual=0x7763b00 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x7763b00, 0) cleared
> column[1.e0id2]::writeLock -- pthread_rwlock_wrlock(0x7764c30) for ~column
> clearing column 1.e0id2
> column[1.e0id2]::writeLock -- pthread_rwlock_unlock(0x7764c30) for ~column
> bitvector (0x7764bc0) clear the content of bitvector with m_vec at 0x7764bd0
> array_t<j>::freeMemory this=0x7764bd0 actual=0x7764cd0 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x7764cd0, 0) cleared
> column[1.e0id4]::writeLock -- pthread_rwlock_wrlock(0x7765df0) for ~column
> clearing column 1.e0id4
> column[1.e0id4]::writeLock -- pthread_rwlock_unlock(0x7765df0) for ~column
> bitvector (0x7765d80) clear the content of bitvector with m_vec at 0x7765d90
> array_t<j>::freeMemory this=0x7765d90 actual=0x7765e90 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x7765e90, 0) cleared
> column[1.e0id5]::writeLock -- pthread_rwlock_wrlock(0x7766fb0) for ~column
> clearing column 1.e0id5
> column[1.e0id5]::writeLock -- pthread_rwlock_unlock(0x7766fb0) for ~column
> bitvector (0x7766f40) clear the content of bitvector with m_vec at 0x7766f50
> array_t<j>::freeMemory this=0x7766f50 actual=0x7767050 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x7767050, 0) cleared
> column[1.e0id6]::writeLock -- pthread_rwlock_wrlock(0x7768170) for ~column
> clearing column 1.e0id6
> column[1.e0id6]::writeLock -- pthread_rwlock_unlock(0x7768170) for ~column
> bitvector (0x7768100) clear the content of bitvector with m_vec at 0x7768110
> array_t<j>::freeMemory this=0x7768110 actual=0x7768210 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x7768210, 0) cleared
> column[1.e0id7]::writeLock -- pthread_rwlock_wrlock(0x7769330) for ~column
> clearing column 1.e0id7
> column[1.e0id7]::writeLock -- pthread_rwlock_unlock(0x7769330) for ~column
> bitvector (0x77692c0) clear the content of bitvector with m_vec at 0x77692d0
> array_t<j>::freeMemory this=0x77692d0 actual=0x77693d0 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x77693d0, 0) cleared
> column[1.e0id8]::writeLock -- pthread_rwlock_wrlock(0x776a4f0) for ~column
> clearing column 1.e0id8
> column[1.e0id8]::writeLock -- pthread_rwlock_unlock(0x776a4f0) for ~column
> bitvector (0x776a480) clear the content of bitvector with m_vec at 0x776a490
> array_t<j>::freeMemory this=0x776a490 actual=0x776a590 and m_begin=0
> (active references: 0, past references: 1)
> fileManager::storage(0x776a590, 0) cleared
> part[1]::releaseAccess -- pthread_rwlock_unlock(0x7fefffbb8) for ~part
> array_t<N4ibis5rid_tE>::freeMemory this=0x776b940 actual=0x776b9a0 and
> m_begin=0 (active references: 0, past references: 1)
> fileManager::storage(0x776b9a0, 0) cleared
> bitvector (0x7fefffb10) clear the content of bitvector with m_vec at 
> 0x7fefffb20
> array_t<j>::freeMemory this=0x7fefffb20 actual=0x776d040 and
> m_begin=0x776d0b0 (active references: 0, past references: 1)
> fileManager::storage(0x776d040, 0x776d0b0) removed 12 bytes to
> decrease totalBytes to 0
> fileManager::storage(0x776d040, 0x776d0b0) cleared
> fileManager::clear has nothing to do
> fileManager decommissioned
> ==25287==
> ==25287== HEAP SUMMARY:
> ==25287==     in use at exit: 40 bytes in 1 blocks
> ==25287==   total heap usage: 466 allocs, 465 frees, 119,596 bytes allocated
> ==25287==
> ==25287== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
> ==25287==    at 0x4C292C7: operator new(unsigned long) (in
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==25287==    by 0x5B2F832: ibis::array_t<unsigned int>::array_t()
> (array_t.cpp:36)
> ==25287==    by 0x604C5A9: ibis::bitvector::bitvector() (bitvector.cpp:33)
> ==25287==    by 0x517990B: ibis::part::part(char const*, bool) (part.cpp:262)
> ==25287==    by 0x40306B: main (test2.cpp:9)
> ==25287==
> ==25287== LEAK SUMMARY:
> ==25287==    definitely lost: 40 bytes in 1 blocks
> ==25287==    indirectly lost: 0 bytes in 0 blocks
> ==25287==      possibly lost: 0 bytes in 0 blocks
> ==25287==    still reachable: 0 bytes in 0 blocks
> ==25287==         suppressed: 0 bytes in 0 blocks
> ==25287==
> ==25287== For counts of detected and suppressed errors, rerun with: -v
> ==25287== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6)
> 
> 
> On 20 September 2012 07:45, K. John Wu <[email protected]> wrote:
>> Hi, Petr,
>>
>> I just updated examples/thula.cpp to exercise the option of reordering
>> read-only data partitions.  It seems to work in a small test I tried.
>>  The new code is in SVN as revision 577.
>>
>> BTW, if yo invoke thula with a data directory and a orderby clause,
>> but without a where clause or select clause, then it will perform the
>> reordering on the data partitions in the given directory.
>>
>> At least on the surface, the following command line (run in tests
>> directory after 'make check') seems to work fine.
>>
>> ../example/thula -d tmp/t2 -o c -v
>>
>>
>> Would you mind give it a try?
>>
>> Thanks.
>>
>> John
>>
>>
>> On 9/19/12 2:41 AM, Petr Velan wrote:
>>> Hi John,
>>>
>>> I managed to get the test program running, but I cannot seem to be
>>> able to reproduce the deadlock. I'll look further into it when I have
>>> some spare time.
>>>
>>> One thing I noticed using the test program, the reorder does not work
>>> on read-only partition, but the documentation says that it should,
>>> since it does not "change" the data. I'm not running the latest SVN
>>> version, so maybe it is somehow resolved and consistent now. I just
>>> wanted to let you know.
>>>
>>> Thanks,
>>> Petr
>>>
>>> On 4 September 2012 07:30, Kesheng Wu <[email protected]> wrote:
>>>> Hi, Petr,
>>>>
>>>> Attached is a modification of the file tests/setqgen.cpp to mimic your
>>>> use case.  So far, it seems to produce exactly the same output set (as
>>>> a whole, not in the individual partitions) as produced by
>>>> tests/setqgen.cpp.  It works OK on my laptop.  Would you mind take a
>>>> look and see if you can get it to behave more like what your program
>>>> does?
>>>>
>>>> Thanks.
>>>>
>>>> John
>>>>
>>>>
>>>> On Thu, Aug 30, 2012 at 11:45 PM, Petr Velan <[email protected]> wrote:
>>>>> Hi John,
>>>>>
>>>>> The memory management in FastBit is good for batch mode of operation,
>>>>> however I need to reduce the memory footprint, since there might be
>>>>> some other operations that might require large amount of memory for
>>>>> the short time, so that the FastBit cannot have it allocated all the
>>>>> time.
>>>>>
>>>>> I know that there is a limit that allows to use maximum of half of
>>>>> available memory and that it can be changed. I think it would be a
>>>>> good thing to have two limits. One to set maximum that can be used and
>>>>> other that would trigger the unload function. The result would be that
>>>>> I would set FastBit to have 0.5GB at ready and allow it to expand to
>>>>> 10GB. So when needed, the fastbit would use up to 10GB of memory, but
>>>>> after that it would would free it and keep only 0.5GB for further use.
>>>>> The default could still be to have both limits at half of available
>>>>> memory. What do you think?
>>>>>
>>>>> We are currently trying to manually call
>>>>> ibis::fileManager::instance().flushDir() to see if it helps to keep
>>>>> the memory down, but I believe that the solution I described earlier
>>>>> is much more generic.
>>>>>
>>>>> Unfortunately, I do not have any simple code to reproduce the
>>>>> deadlock. I'll try to look into it, maybe compile FastBit with
>>>>> debugging symbols to help us better understand what is  going on.
>>>>>
>>>>> Petr
>>>>>
>>>>> On 30 August 2012 20:29, K. John Wu <[email protected]> wrote:
>>>>>> Hi, Petr,
>>>>>>
>>>>>> Thanks for clarifying the use case.  Looks like you can not wait for
>>>>>> everything to be done before releasing the ibis::part objects.
>>>>>> Regarding the memory usage, FastBit does lazy deletions - as long as
>>>>>> no one needs new memory, the existing content read from files will be
>>>>>> kept in memory.  The default maximum memory to be used is a half of
>>>>>> the physical memory - which explains what you've observed.  Once
>>>>>> reaching that limit, ibis::fileManager::unload will be called to
>>>>>> remove the content of files that are no in active use.  In your case,
>>>>>> it sounds like there will be a lot of old files to be removed from 
>>>>>> memory.
>>>>>>
>>>>>> Since there is no clear indication which thread is holding on to the
>>>>>> mutex lock, we might need to create a multithreaded data generator
>>>>>> that can mimic your data ingestion process.  If you have simple one
>>>>>> that I can borrow, I would greatly appreciate it.
>>>>>>
>>>>>> Most likely, another copy of ibis::fileManager::getFile is holding on
>>>>>> to the ibis::fileManager::mutex.  However, logically, that is not
>>>>>> possible because that thread can only be waiting on a conditional
>>>>>> variable in which case it should have yield the mutex lock already.
>>>>>> Anyway, something gnarly is going on here..
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 8/29/12 10:54 PM, Petr Velan wrote:
>>>>>>> Hi John,
>>>>>>>
>>>>>>> I still do not understand why there is a deadlock, or why is the
>>>>>>> access to different partitions managed by same mutex lock.
>>>>>>>
>>>>>>> Our use case is this:
>>>>>>> We have a process that collects data from network and stores them to
>>>>>>> fastbit partitions. Each partition contains 5 minutes of data,
>>>>>>> approximately 300-400MB. After 5 minutes expire, new thread is
>>>>>>> launched that creates ibis::part, runs reorder, deletes the part,
>>>>>>> creates ibis::table which is used to create indexes and then deletes
>>>>>>> the table. After that the thread ends.
>>>>>>>
>>>>>>> Since there is data from multiple sources, there are multiple threads
>>>>>>> that store the data and reorder/index it.
>>>>>>>
>>>>>>> What is bothering me are two things:
>>>>>>> The deadlock, since the mutex should only synchronize, I wonder who
>>>>>>> really holds the lock when both threads are waiting for it.
>>>>>>> Second is that the memory used by the process constantly grows. After
>>>>>>> the parts and tables are deleted, I would expect the memory to be
>>>>>>> released as well, since for next 5 minutes, it will not be needed.
>>>>>>> Unfortunately, FastBit does not free the memory until it reaches 50%
>>>>>>> of total memory, which in our case is 6GB. That is kind of
>>>>>>> unfortunate, since what it should really need is about 1GB of memory
>>>>>>> for reorder in the worst case and then the memory should be free to
>>>>>>> use by other processes. Is there any way to achieve this? The memory
>>>>>>> is consumed even without the reordering, only when building indexes.
>>>>>>>
>>>>>>> Thank you for the warning about strings, we plan to use them in
>>>>>>> future, so we will have to without the reorder in that case.
>>>>>>>
>>>>>>> Petr
>>>>>>>
>>>>>>> On 29 August 2012 18:28, K. John Wu <[email protected]> wrote:
>>>>>>>> Hi, Petr,
>>>>>>>>
>>>>>>>> From the stack traces, look like one thread is trying to free a data
>>>>>>>> partition object while another one is trying to reorder the rows of
>>>>>>>> presumably another data partition.  The first mutex lock is invoked
>>>>>>>> from the constructor of a storage object (ibis::fileManager::storage).
>>>>>>>>  This is invoked because the amount of data in memory (tracked by the
>>>>>>>> file manager) is close to the prescribed maximum (maxBytes).  The
>>>>>>>> second mutex lock is invoked from a function called
>>>>>>>> ibis::fileManager::removeCleaner (which is invoked by the destructor
>>>>>>>> of an ibis::part object).
>>>>>>>>
>>>>>>>> Running out memory seems to be the fundamental problem here.
>>>>>>>> Presumably, you only need to do reordering once and your datasets are
>>>>>>>> quite large.  I would suggest that you use only a single thread to
>>>>>>>> reorder your data - this way all the memory will devoted to a single
>>>>>>>> reordering operation.
>>>>>>>>
>>>>>>>> If you really do have a lot of memory (or each data partition is
>>>>>>>> relatively small) and want to do the reordering with multiple threads,
>>>>>>>> then delay the operation of freeing the ibis::part objects until you
>>>>>>>> are done with all reordering operations.  The cleaner objects from
>>>>>>>> each data partition will make sure each ibis::part object is taking
>>>>>>>> only a minimal amount of memory.
>>>>>>>>
>>>>>>>> A note of warning, the current code only sort the numerical values,
>>>>>>>> any strings or blobs will be left untouched.  If your datasets have
>>>>>>>> strings or blobs, your datasets will not be coherent after calling the
>>>>>>>> function reorder!
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/29/12 4:57 AM, Petr Velan wrote:
>>>>>>>>> Hi John,
>>>>>>>>>
>>>>>>>>> thank you for all the work that you put into the FastBit library, it
>>>>>>>>> allows us to achieve great results!
>>>>>>>>>
>>>>>>>>> I've bumped into a little bug which might be very hard to reproduce or
>>>>>>>>> identify. I'm using two thread to reorder and index data that are
>>>>>>>>> already stored on disk. It was ok for a little while, but then it
>>>>>>>>> stuck in deadlock. Here are gdb traces from both threads,
>>>>>>>>> unfortunately without debugging symbols, so that the specific files
>>>>>>>>> and lines are unknown.
>>>>>>>>>
>>>>>>>>> We are currently using the SVN version 532.
>>>>>>>>>
>>>>>>>>> (gdb) bt
>>>>>>>>> #0  0x00007f8983463054 in __lll_lock_wait () from 
>>>>>>>>> /lib64/libpthread.so.0
>>>>>>>>> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
>>>>>>>>> #2  0x00007f898345e257 in pthread_mutex_lock () from 
>>>>>>>>> /lib64/libpthread.so.0
>>>>>>>>> #3  0x00007f898271e074 in ibis::fileManager::storage::storage(unsigned
>>>>>>>>> long) () from /usr/lib64/libfastbit.so.0
>>>>>>>>> #4  0x00007f898271eb16 in ibis::fileManager::storage::enlarge(unsigned
>>>>>>>>> long) () from /usr/lib64/libfastbit.so.0
>>>>>>>>> #5  0x00007f898272214f in ibis::fileManager::roFile::doRead(char
>>>>>>>>> const*) () from /usr/lib64/libfastbit.so.0
>>>>>>>>> #6  0x00007f8982723b4b in ibis::fileManager::getFile(char const*,
>>>>>>>>> ibis::fileManager::storage**, ibis::fileManager::ACCESS_PREFERENCE) ()
>>>>>>>>> from /usr/lib64/libfastbit.so.0
>>>>>>>>> #7  0x00007f898273406a in int ibis::fileManager::getFile<unsigned
>>>>>>>>> short>(char const*, ibis::array_t<unsigned short>&,
>>>>>>>>> ibis::fileManager::ACCESS_PREFERENCE) () from
>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>> #8  0x00007f8981f9f4a5 in ibis::column::actualMinMax(char const*,
>>>>>>>>> ibis::bitvector const&, double&, double&) const () from
>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>> #9  0x00007f8981fa3546 in ibis::column::computeMinMax() () from
>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>> #10 0x00007f89827beae6 in
>>>>>>>>> ibis::part::gatherSortKeys(ibis::array_t<char const*>&) () from
>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>> #11 0x00007f89827bfc56 in ibis::part::reorder() () from
>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>> #12 0x00007f8982c7e2af in reorder_index(void*) () from
>>>>>>>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
>>>>>>>>> #13 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
>>>>>>>>> #14 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
>>>>>>>>> #15 0x0000000000000000 in ?? ()
>>>>>>>>> (gdb)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (gdb) bt
>>>>>>>>> #0  0x00007f8983463054 in __lll_lock_wait () from 
>>>>>>>>> /lib64/libpthread.so.0
>>>>>>>>> #1  0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0
>>>>>>>>> #2  0x00007f898345e257 in pthread_mutex_lock () from 
>>>>>>>>> /lib64/libpthread.so.0
>>>>>>>>> #3  0x00007f898175a6aa in
>>>>>>>>> ibis::util::mutexLock::mutexLock(pthread_mutex_t*, char const*) ()
>>>>>>>>> from /usr/lib64/libfastbit.so.0
>>>>>>>>> #4  0x00007f89827177d4 in
>>>>>>>>> ibis::fileManager::removeCleaner(ibis::fileManager::cleaner const*) ()
>>>>>>>>> from /usr/lib64/libfastbit.so.0
>>>>>>>>> #5  0x00007f8981735952 in ibis::part::~part() () from 
>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>> #6  0x00007f8981735c29 in ibis::part::~part() () from 
>>>>>>>>> /usr/lib64/libfastbit.so.0
>>>>>>>>> #7  0x00007f8982c7e2cd in reorder_index(void*) () from
>>>>>>>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so
>>>>>>>>> #8  0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0
>>>>>>>>> #9  0x00007f89831aa6dd in next_line () from /lib64/libc.so.6
>>>>>>>>> #10 0x0000000000000000 in ?? ()
>>>>>>>>> (gdb)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do you have any idea what might be going on?
>>>>>>>>>
>>>>>>>>> With regards,
>>>>>>>>> Petr Velan
>>>>>>>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to