Hi John, The problem seems to be solved now, I see no more memory leaks. Thank you for the quick fix.
Petr On 20 September 2012 23:57, K. John Wu <[email protected]> wrote: > Thanks, Petr and Michael, > > Please give SVN Revision 579 a try when you get the chance. The > problem seems to be in the function ibis::array_t<T>::push_back. It > lost track of the storage object (actual). This problem should be > fixed with the current structure of the tests. Let us know if > encounter any problems. > > John > > > On 9/20/12 8:12 AM, Petr Velan wrote: >> Hi John, >> >> I just updated to latest SVN revision and tried the reorder function >> on read-only partition. The reordering is now performed exactly as >> described in documentation, so the problem no longer exists. >> >> However, I run into a problem with some memory leaks in the new >> revision. My previous version was 3.0.10 (SVN 532) and there were no >> leaks in my usecase. Now I'm loosing some memory. To simulate the >> memory loss, you only need to create a partition in a simple program: >> #include <fastbit/ibis.h> >> #include <iostream> >> >> int main(int argc, char *argv[]) { >> ibis::gVerbose = 10; >> ibis::part part(argv[1], false); >> return 0; >> } >> >> Attached below is the program log with verbose set to 10. You might >> notice that the leak is 40 bytes long, and that it is caused by >> fileManager::storage class created in array_t constructor not being >> deleted. In my case, the first constucted array_t's actual is probably >> not removed. I've managed to trace the problem to "ibis::bitvector >> amask" of ibis::part, but I cannot seem to find the problem. Most >> likely the fileManager is replaced somewhere without proper >> deallocation, but I cannot find it. >> >> Yours sincerely, >> Petr Velan >> >> velan@wall:~/Documents/devel/tmp/test> valgrind --leak-check=full >> ./test2 ../../data/000000000001/1/ >> ==25287== Memcheck, a memory error detector >> ==25287== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. >> ==25287== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info >> ==25287== Command: ./test2 ../../data/000000000001/1/ >> ==25287== >> >> FastBit ibis1.3.2.6 >> Log messages started on Thu Sep 20 17:05:57 2012 >> fileManager::storage(0x775b8e0, 0) initialization completed >> array_t<j> constructed at 0x7fefffb20 with actual=0x775b8e0, m_begin=0 >> and actual->size()=0 >> bitvector (0x7fefffb10) constructed with m_vec at 0x7fefffb20 >> fileManager::ctor found the physical memory size to be 4083470336 bytes >> fileManager initialization complete -- maxBytes=2041735168, maxOpenFiles=768 >> part::readMetaData -- opened ../../data/000000000001/1/-part.txt for reading >> Name = "1" >> >> Description = "Generated by ipfixcol fastbit plugin" >> >> Number_of_columns = 11 >> >> Number_of_rows = 499992 >> >> Timestamp = 1329386614 >> >> State = 1 >> >> END HEADER >> >> fileManager::storage(0x775f3e0, 0) initialization completed >> array_t<j> constructed at 0x775f2e0 with actual=0x775f3e0, m_begin=0 >> and actual->size()=0 >> bitvector (0x775f2d0) constructed with m_vec at 0x775f2e0 >> read info about column 1.e0id1 (ULONG) >> part::readMetaData -- got column e0id1 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x77605a0, 0) initialization completed >> array_t<j> constructed at 0x77604a0 with actual=0x77605a0, m_begin=0 >> and actual->size()=0 >> bitvector (0x7760490) constructed with m_vec at 0x77604a0 >> read info about column 1.e0id11 (USHORT) >> part::readMetaData -- got column e0id11 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x7761770, 0) initialization completed >> array_t<j> constructed at 0x7761670 with actual=0x7761770, m_begin=0 >> and actual->size()=0 >> bitvector (0x7761660) constructed with m_vec at 0x7761670 >> read info about column 1.e0id12 (UINT) >> part::readMetaData -- got column e0id12 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x7762930, 0) initialization completed >> array_t<j> constructed at 0x7762830 with actual=0x7762930, m_begin=0 >> and actual->size()=0 >> bitvector (0x7762820) constructed with m_vec at 0x7762830 >> read info about column 1.e0id152 (ULONG) >> part::readMetaData -- got column e0id152 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x7763b00, 0) initialization completed >> array_t<j> constructed at 0x7763a00 with actual=0x7763b00, m_begin=0 >> and actual->size()=0 >> bitvector (0x77639f0) constructed with m_vec at 0x7763a00 >> read info about column 1.e0id153 (ULONG) >> part::readMetaData -- got column e0id153 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x7764cd0, 0) initialization completed >> array_t<j> constructed at 0x7764bd0 with actual=0x7764cd0, m_begin=0 >> and actual->size()=0 >> bitvector (0x7764bc0) constructed with m_vec at 0x7764bd0 >> read info about column 1.e0id2 (ULONG) >> part::readMetaData -- got column e0id2 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x7765e90, 0) initialization completed >> array_t<j> constructed at 0x7765d90 with actual=0x7765e90, m_begin=0 >> and actual->size()=0 >> bitvector (0x7765d80) constructed with m_vec at 0x7765d90 >> read info about column 1.e0id4 (UBYTE) >> part::readMetaData -- got column e0id4 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x7767050, 0) initialization completed >> array_t<j> constructed at 0x7766f50 with actual=0x7767050, m_begin=0 >> and actual->size()=0 >> bitvector (0x7766f40) constructed with m_vec at 0x7766f50 >> read info about column 1.e0id5 (UBYTE) >> part::readMetaData -- got column e0id5 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x7768210, 0) initialization completed >> array_t<j> constructed at 0x7768110 with actual=0x7768210, m_begin=0 >> and actual->size()=0 >> bitvector (0x7768100) constructed with m_vec at 0x7768110 >> read info about column 1.e0id6 (UBYTE) >> part::readMetaData -- got column e0id6 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x77693d0, 0) initialization completed >> array_t<j> constructed at 0x77692d0 with actual=0x77693d0, m_begin=0 >> and actual->size()=0 >> bitvector (0x77692c0) constructed with m_vec at 0x77692d0 >> read info about column 1.e0id7 (USHORT) >> part::readMetaData -- got column e0id7 from >> ../../data/000000000001/1/-part.txt >> fileManager::storage(0x776a590, 0) initialization completed >> array_t<j> constructed at 0x776a490 with actual=0x776a590, m_begin=0 >> and actual->size()=0 >> bitvector (0x776a480) constructed with m_vec at 0x776a490 >> read info about column 1.e0id8 (UINT) >> part::readMetaData -- got column e0id8 from >> ../../data/000000000001/1/-part.txt >> part[1]::gainReadAccess -- pthread_rwlock_rdlock(0x7fefffbb8) for readRIDs >> fileManager::storage(0x776b9a0, 0) initialization completed >> array_t<N4ibis5rid_tE> constructed at 0x776b940 with actual=0x776b9a0, >> m_begin=0 and actual->size()=0 >> part[1]::readRIDs -- the file manager failed to read file >> "../../data/000000000001/1/-rids". There is no RIDs. >> part[1]::releaseAccess -- pthread_rwlock_unlock(0x7fefffbb8) for readRIDs >> Warning -- failed to read the content of >> ../../data/000000000001/1/-part.msk, fileManager::getFile returned >> -101 >> fileManager::storage(0x776d040, 0x776d0b0) added 12 bytes to increase >> totalBytes to 12 >> fileManager::storage(0x776d040, 0x776d0b0) initialization completed >> with 12 elements >> fileManager::flushFile will do nothing because >> "../../data/000000000001/1/-part.msk" is not tracked by the file >> manager >> part::init -- mask for partition 1 has 499992 set bits out of 499992 >> Constructed a part named 1 >> activeDir = "../../data/000000000001/1" >> part: 1 (Generated by ipfixcol fastbit plugin) with 499992 rows, 11 columns >> Column list: >> e0id1: (ULONG) [28, 1.44872e+09] >> e0id11: (USHORT) [0, 65535] >> e0id12: (UINT) [1.0466e+08, 4.02607e+09] >> e0id152: (ULONG) [1.26981e+12, 1.26982e+12] >> e0id153: (ULONG) [1.26981e+12, 1.26982e+12] >> e0id2: (ULONG) [1, 1.3616e+06] >> e0id4: (UBYTE) [1, 41] >> e0id5: (UBYTE) [0, 0] >> e0id6: (UBYTE) [0, 31] >> e0id7: (USHORT) [0, 65535] >> e0id8: (UINT) [1.0466e+08, 3.75763e+09] >> >> part[1]::gainWriteAccess -- pthread_rwlock_wrlock(0x7fefffbb8) for ~part >> clearing data partition 1 >> column[1.e0id1]::writeLock -- pthread_rwlock_wrlock(0x775f340) for ~column >> clearing column 1.e0id1 >> column[1.e0id1]::writeLock -- pthread_rwlock_unlock(0x775f340) for ~column >> bitvector (0x775f2d0) clear the content of bitvector with m_vec at 0x775f2e0 >> array_t<j>::freeMemory this=0x775f2e0 actual=0x775f3e0 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x775f3e0, 0) cleared >> column[1.e0id11]::writeLock -- pthread_rwlock_wrlock(0x7760500) for ~column >> clearing column 1.e0id11 >> column[1.e0id11]::writeLock -- pthread_rwlock_unlock(0x7760500) for ~column >> bitvector (0x7760490) clear the content of bitvector with m_vec at 0x77604a0 >> array_t<j>::freeMemory this=0x77604a0 actual=0x77605a0 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x77605a0, 0) cleared >> column[1.e0id12]::writeLock -- pthread_rwlock_wrlock(0x77616d0) for ~column >> clearing column 1.e0id12 >> column[1.e0id12]::writeLock -- pthread_rwlock_unlock(0x77616d0) for ~column >> bitvector (0x7761660) clear the content of bitvector with m_vec at 0x7761670 >> array_t<j>::freeMemory this=0x7761670 actual=0x7761770 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x7761770, 0) cleared >> column[1.e0id152]::writeLock -- pthread_rwlock_wrlock(0x7762890) for ~column >> clearing column 1.e0id152 >> column[1.e0id152]::writeLock -- pthread_rwlock_unlock(0x7762890) for ~column >> bitvector (0x7762820) clear the content of bitvector with m_vec at 0x7762830 >> array_t<j>::freeMemory this=0x7762830 actual=0x7762930 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x7762930, 0) cleared >> column[1.e0id153]::writeLock -- pthread_rwlock_wrlock(0x7763a60) for ~column >> clearing column 1.e0id153 >> column[1.e0id153]::writeLock -- pthread_rwlock_unlock(0x7763a60) for ~column >> bitvector (0x77639f0) clear the content of bitvector with m_vec at 0x7763a00 >> array_t<j>::freeMemory this=0x7763a00 actual=0x7763b00 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x7763b00, 0) cleared >> column[1.e0id2]::writeLock -- pthread_rwlock_wrlock(0x7764c30) for ~column >> clearing column 1.e0id2 >> column[1.e0id2]::writeLock -- pthread_rwlock_unlock(0x7764c30) for ~column >> bitvector (0x7764bc0) clear the content of bitvector with m_vec at 0x7764bd0 >> array_t<j>::freeMemory this=0x7764bd0 actual=0x7764cd0 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x7764cd0, 0) cleared >> column[1.e0id4]::writeLock -- pthread_rwlock_wrlock(0x7765df0) for ~column >> clearing column 1.e0id4 >> column[1.e0id4]::writeLock -- pthread_rwlock_unlock(0x7765df0) for ~column >> bitvector (0x7765d80) clear the content of bitvector with m_vec at 0x7765d90 >> array_t<j>::freeMemory this=0x7765d90 actual=0x7765e90 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x7765e90, 0) cleared >> column[1.e0id5]::writeLock -- pthread_rwlock_wrlock(0x7766fb0) for ~column >> clearing column 1.e0id5 >> column[1.e0id5]::writeLock -- pthread_rwlock_unlock(0x7766fb0) for ~column >> bitvector (0x7766f40) clear the content of bitvector with m_vec at 0x7766f50 >> array_t<j>::freeMemory this=0x7766f50 actual=0x7767050 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x7767050, 0) cleared >> column[1.e0id6]::writeLock -- pthread_rwlock_wrlock(0x7768170) for ~column >> clearing column 1.e0id6 >> column[1.e0id6]::writeLock -- pthread_rwlock_unlock(0x7768170) for ~column >> bitvector (0x7768100) clear the content of bitvector with m_vec at 0x7768110 >> array_t<j>::freeMemory this=0x7768110 actual=0x7768210 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x7768210, 0) cleared >> column[1.e0id7]::writeLock -- pthread_rwlock_wrlock(0x7769330) for ~column >> clearing column 1.e0id7 >> column[1.e0id7]::writeLock -- pthread_rwlock_unlock(0x7769330) for ~column >> bitvector (0x77692c0) clear the content of bitvector with m_vec at 0x77692d0 >> array_t<j>::freeMemory this=0x77692d0 actual=0x77693d0 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x77693d0, 0) cleared >> column[1.e0id8]::writeLock -- pthread_rwlock_wrlock(0x776a4f0) for ~column >> clearing column 1.e0id8 >> column[1.e0id8]::writeLock -- pthread_rwlock_unlock(0x776a4f0) for ~column >> bitvector (0x776a480) clear the content of bitvector with m_vec at 0x776a490 >> array_t<j>::freeMemory this=0x776a490 actual=0x776a590 and m_begin=0 >> (active references: 0, past references: 1) >> fileManager::storage(0x776a590, 0) cleared >> part[1]::releaseAccess -- pthread_rwlock_unlock(0x7fefffbb8) for ~part >> array_t<N4ibis5rid_tE>::freeMemory this=0x776b940 actual=0x776b9a0 and >> m_begin=0 (active references: 0, past references: 1) >> fileManager::storage(0x776b9a0, 0) cleared >> bitvector (0x7fefffb10) clear the content of bitvector with m_vec at >> 0x7fefffb20 >> array_t<j>::freeMemory this=0x7fefffb20 actual=0x776d040 and >> m_begin=0x776d0b0 (active references: 0, past references: 1) >> fileManager::storage(0x776d040, 0x776d0b0) removed 12 bytes to >> decrease totalBytes to 0 >> fileManager::storage(0x776d040, 0x776d0b0) cleared >> fileManager::clear has nothing to do >> fileManager decommissioned >> ==25287== >> ==25287== HEAP SUMMARY: >> ==25287== in use at exit: 40 bytes in 1 blocks >> ==25287== total heap usage: 466 allocs, 465 frees, 119,596 bytes allocated >> ==25287== >> ==25287== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1 >> ==25287== at 0x4C292C7: operator new(unsigned long) (in >> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==25287== by 0x5B2F832: ibis::array_t<unsigned int>::array_t() >> (array_t.cpp:36) >> ==25287== by 0x604C5A9: ibis::bitvector::bitvector() (bitvector.cpp:33) >> ==25287== by 0x517990B: ibis::part::part(char const*, bool) (part.cpp:262) >> ==25287== by 0x40306B: main (test2.cpp:9) >> ==25287== >> ==25287== LEAK SUMMARY: >> ==25287== definitely lost: 40 bytes in 1 blocks >> ==25287== indirectly lost: 0 bytes in 0 blocks >> ==25287== possibly lost: 0 bytes in 0 blocks >> ==25287== still reachable: 0 bytes in 0 blocks >> ==25287== suppressed: 0 bytes in 0 blocks >> ==25287== >> ==25287== For counts of detected and suppressed errors, rerun with: -v >> ==25287== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6) >> >> >> On 20 September 2012 07:45, K. John Wu <[email protected]> wrote: >>> Hi, Petr, >>> >>> I just updated examples/thula.cpp to exercise the option of reordering >>> read-only data partitions. It seems to work in a small test I tried. >>> The new code is in SVN as revision 577. >>> >>> BTW, if yo invoke thula with a data directory and a orderby clause, >>> but without a where clause or select clause, then it will perform the >>> reordering on the data partitions in the given directory. >>> >>> At least on the surface, the following command line (run in tests >>> directory after 'make check') seems to work fine. >>> >>> ../example/thula -d tmp/t2 -o c -v >>> >>> >>> Would you mind give it a try? >>> >>> Thanks. >>> >>> John >>> >>> >>> On 9/19/12 2:41 AM, Petr Velan wrote: >>>> Hi John, >>>> >>>> I managed to get the test program running, but I cannot seem to be >>>> able to reproduce the deadlock. I'll look further into it when I have >>>> some spare time. >>>> >>>> One thing I noticed using the test program, the reorder does not work >>>> on read-only partition, but the documentation says that it should, >>>> since it does not "change" the data. I'm not running the latest SVN >>>> version, so maybe it is somehow resolved and consistent now. I just >>>> wanted to let you know. >>>> >>>> Thanks, >>>> Petr >>>> >>>> On 4 September 2012 07:30, Kesheng Wu <[email protected]> wrote: >>>>> Hi, Petr, >>>>> >>>>> Attached is a modification of the file tests/setqgen.cpp to mimic your >>>>> use case. So far, it seems to produce exactly the same output set (as >>>>> a whole, not in the individual partitions) as produced by >>>>> tests/setqgen.cpp. It works OK on my laptop. Would you mind take a >>>>> look and see if you can get it to behave more like what your program >>>>> does? >>>>> >>>>> Thanks. >>>>> >>>>> John >>>>> >>>>> >>>>> On Thu, Aug 30, 2012 at 11:45 PM, Petr Velan <[email protected]> wrote: >>>>>> Hi John, >>>>>> >>>>>> The memory management in FastBit is good for batch mode of operation, >>>>>> however I need to reduce the memory footprint, since there might be >>>>>> some other operations that might require large amount of memory for >>>>>> the short time, so that the FastBit cannot have it allocated all the >>>>>> time. >>>>>> >>>>>> I know that there is a limit that allows to use maximum of half of >>>>>> available memory and that it can be changed. I think it would be a >>>>>> good thing to have two limits. One to set maximum that can be used and >>>>>> other that would trigger the unload function. The result would be that >>>>>> I would set FastBit to have 0.5GB at ready and allow it to expand to >>>>>> 10GB. So when needed, the fastbit would use up to 10GB of memory, but >>>>>> after that it would would free it and keep only 0.5GB for further use. >>>>>> The default could still be to have both limits at half of available >>>>>> memory. What do you think? >>>>>> >>>>>> We are currently trying to manually call >>>>>> ibis::fileManager::instance().flushDir() to see if it helps to keep >>>>>> the memory down, but I believe that the solution I described earlier >>>>>> is much more generic. >>>>>> >>>>>> Unfortunately, I do not have any simple code to reproduce the >>>>>> deadlock. I'll try to look into it, maybe compile FastBit with >>>>>> debugging symbols to help us better understand what is going on. >>>>>> >>>>>> Petr >>>>>> >>>>>> On 30 August 2012 20:29, K. John Wu <[email protected]> wrote: >>>>>>> Hi, Petr, >>>>>>> >>>>>>> Thanks for clarifying the use case. Looks like you can not wait for >>>>>>> everything to be done before releasing the ibis::part objects. >>>>>>> Regarding the memory usage, FastBit does lazy deletions - as long as >>>>>>> no one needs new memory, the existing content read from files will be >>>>>>> kept in memory. The default maximum memory to be used is a half of >>>>>>> the physical memory - which explains what you've observed. Once >>>>>>> reaching that limit, ibis::fileManager::unload will be called to >>>>>>> remove the content of files that are no in active use. In your case, >>>>>>> it sounds like there will be a lot of old files to be removed from >>>>>>> memory. >>>>>>> >>>>>>> Since there is no clear indication which thread is holding on to the >>>>>>> mutex lock, we might need to create a multithreaded data generator >>>>>>> that can mimic your data ingestion process. If you have simple one >>>>>>> that I can borrow, I would greatly appreciate it. >>>>>>> >>>>>>> Most likely, another copy of ibis::fileManager::getFile is holding on >>>>>>> to the ibis::fileManager::mutex. However, logically, that is not >>>>>>> possible because that thread can only be waiting on a conditional >>>>>>> variable in which case it should have yield the mutex lock already. >>>>>>> Anyway, something gnarly is going on here.. >>>>>>> >>>>>>> John >>>>>>> >>>>>>> >>>>>>> On 8/29/12 10:54 PM, Petr Velan wrote: >>>>>>>> Hi John, >>>>>>>> >>>>>>>> I still do not understand why there is a deadlock, or why is the >>>>>>>> access to different partitions managed by same mutex lock. >>>>>>>> >>>>>>>> Our use case is this: >>>>>>>> We have a process that collects data from network and stores them to >>>>>>>> fastbit partitions. Each partition contains 5 minutes of data, >>>>>>>> approximately 300-400MB. After 5 minutes expire, new thread is >>>>>>>> launched that creates ibis::part, runs reorder, deletes the part, >>>>>>>> creates ibis::table which is used to create indexes and then deletes >>>>>>>> the table. After that the thread ends. >>>>>>>> >>>>>>>> Since there is data from multiple sources, there are multiple threads >>>>>>>> that store the data and reorder/index it. >>>>>>>> >>>>>>>> What is bothering me are two things: >>>>>>>> The deadlock, since the mutex should only synchronize, I wonder who >>>>>>>> really holds the lock when both threads are waiting for it. >>>>>>>> Second is that the memory used by the process constantly grows. After >>>>>>>> the parts and tables are deleted, I would expect the memory to be >>>>>>>> released as well, since for next 5 minutes, it will not be needed. >>>>>>>> Unfortunately, FastBit does not free the memory until it reaches 50% >>>>>>>> of total memory, which in our case is 6GB. That is kind of >>>>>>>> unfortunate, since what it should really need is about 1GB of memory >>>>>>>> for reorder in the worst case and then the memory should be free to >>>>>>>> use by other processes. Is there any way to achieve this? The memory >>>>>>>> is consumed even without the reordering, only when building indexes. >>>>>>>> >>>>>>>> Thank you for the warning about strings, we plan to use them in >>>>>>>> future, so we will have to without the reorder in that case. >>>>>>>> >>>>>>>> Petr >>>>>>>> >>>>>>>> On 29 August 2012 18:28, K. John Wu <[email protected]> wrote: >>>>>>>>> Hi, Petr, >>>>>>>>> >>>>>>>>> From the stack traces, look like one thread is trying to free a data >>>>>>>>> partition object while another one is trying to reorder the rows of >>>>>>>>> presumably another data partition. The first mutex lock is invoked >>>>>>>>> from the constructor of a storage object (ibis::fileManager::storage). >>>>>>>>> This is invoked because the amount of data in memory (tracked by the >>>>>>>>> file manager) is close to the prescribed maximum (maxBytes). The >>>>>>>>> second mutex lock is invoked from a function called >>>>>>>>> ibis::fileManager::removeCleaner (which is invoked by the destructor >>>>>>>>> of an ibis::part object). >>>>>>>>> >>>>>>>>> Running out memory seems to be the fundamental problem here. >>>>>>>>> Presumably, you only need to do reordering once and your datasets are >>>>>>>>> quite large. I would suggest that you use only a single thread to >>>>>>>>> reorder your data - this way all the memory will devoted to a single >>>>>>>>> reordering operation. >>>>>>>>> >>>>>>>>> If you really do have a lot of memory (or each data partition is >>>>>>>>> relatively small) and want to do the reordering with multiple threads, >>>>>>>>> then delay the operation of freeing the ibis::part objects until you >>>>>>>>> are done with all reordering operations. The cleaner objects from >>>>>>>>> each data partition will make sure each ibis::part object is taking >>>>>>>>> only a minimal amount of memory. >>>>>>>>> >>>>>>>>> A note of warning, the current code only sort the numerical values, >>>>>>>>> any strings or blobs will be left untouched. If your datasets have >>>>>>>>> strings or blobs, your datasets will not be coherent after calling the >>>>>>>>> function reorder! >>>>>>>>> >>>>>>>>> John >>>>>>>>> >>>>>>>>> >>>>>>>>> On 8/29/12 4:57 AM, Petr Velan wrote: >>>>>>>>>> Hi John, >>>>>>>>>> >>>>>>>>>> thank you for all the work that you put into the FastBit library, it >>>>>>>>>> allows us to achieve great results! >>>>>>>>>> >>>>>>>>>> I've bumped into a little bug which might be very hard to reproduce >>>>>>>>>> or >>>>>>>>>> identify. I'm using two thread to reorder and index data that are >>>>>>>>>> already stored on disk. It was ok for a little while, but then it >>>>>>>>>> stuck in deadlock. Here are gdb traces from both threads, >>>>>>>>>> unfortunately without debugging symbols, so that the specific files >>>>>>>>>> and lines are unknown. >>>>>>>>>> >>>>>>>>>> We are currently using the SVN version 532. >>>>>>>>>> >>>>>>>>>> (gdb) bt >>>>>>>>>> #0 0x00007f8983463054 in __lll_lock_wait () from >>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>> #1 0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0 >>>>>>>>>> #2 0x00007f898345e257 in pthread_mutex_lock () from >>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>> #3 0x00007f898271e074 in >>>>>>>>>> ibis::fileManager::storage::storage(unsigned >>>>>>>>>> long) () from /usr/lib64/libfastbit.so.0 >>>>>>>>>> #4 0x00007f898271eb16 in >>>>>>>>>> ibis::fileManager::storage::enlarge(unsigned >>>>>>>>>> long) () from /usr/lib64/libfastbit.so.0 >>>>>>>>>> #5 0x00007f898272214f in ibis::fileManager::roFile::doRead(char >>>>>>>>>> const*) () from /usr/lib64/libfastbit.so.0 >>>>>>>>>> #6 0x00007f8982723b4b in ibis::fileManager::getFile(char const*, >>>>>>>>>> ibis::fileManager::storage**, ibis::fileManager::ACCESS_PREFERENCE) >>>>>>>>>> () >>>>>>>>>> from /usr/lib64/libfastbit.so.0 >>>>>>>>>> #7 0x00007f898273406a in int ibis::fileManager::getFile<unsigned >>>>>>>>>> short>(char const*, ibis::array_t<unsigned short>&, >>>>>>>>>> ibis::fileManager::ACCESS_PREFERENCE) () from >>>>>>>>>> /usr/lib64/libfastbit.so.0 >>>>>>>>>> #8 0x00007f8981f9f4a5 in ibis::column::actualMinMax(char const*, >>>>>>>>>> ibis::bitvector const&, double&, double&) const () from >>>>>>>>>> /usr/lib64/libfastbit.so.0 >>>>>>>>>> #9 0x00007f8981fa3546 in ibis::column::computeMinMax() () from >>>>>>>>>> /usr/lib64/libfastbit.so.0 >>>>>>>>>> #10 0x00007f89827beae6 in >>>>>>>>>> ibis::part::gatherSortKeys(ibis::array_t<char const*>&) () from >>>>>>>>>> /usr/lib64/libfastbit.so.0 >>>>>>>>>> #11 0x00007f89827bfc56 in ibis::part::reorder() () from >>>>>>>>>> /usr/lib64/libfastbit.so.0 >>>>>>>>>> #12 0x00007f8982c7e2af in reorder_index(void*) () from >>>>>>>>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so >>>>>>>>>> #13 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>>>> #14 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6 >>>>>>>>>> #15 0x0000000000000000 in ?? () >>>>>>>>>> (gdb) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (gdb) bt >>>>>>>>>> #0 0x00007f8983463054 in __lll_lock_wait () from >>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>> #1 0x00007f898345e388 in _L_lock_854 () from /lib64/libpthread.so.0 >>>>>>>>>> #2 0x00007f898345e257 in pthread_mutex_lock () from >>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>> #3 0x00007f898175a6aa in >>>>>>>>>> ibis::util::mutexLock::mutexLock(pthread_mutex_t*, char const*) () >>>>>>>>>> from /usr/lib64/libfastbit.so.0 >>>>>>>>>> #4 0x00007f89827177d4 in >>>>>>>>>> ibis::fileManager::removeCleaner(ibis::fileManager::cleaner const*) >>>>>>>>>> () >>>>>>>>>> from /usr/lib64/libfastbit.so.0 >>>>>>>>>> #5 0x00007f8981735952 in ibis::part::~part() () from >>>>>>>>>> /usr/lib64/libfastbit.so.0 >>>>>>>>>> #6 0x00007f8981735c29 in ibis::part::~part() () from >>>>>>>>>> /usr/lib64/libfastbit.so.0 >>>>>>>>>> #7 0x00007f8982c7e2cd in reorder_index(void*) () from >>>>>>>>>> /usr/share/ipfixcol/plugins/ipfixcol-fastbit-output.so >>>>>>>>>> #8 0x00007f898345c851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>>>> #9 0x00007f89831aa6dd in next_line () from /lib64/libc.so.6 >>>>>>>>>> #10 0x0000000000000000 in ?? () >>>>>>>>>> (gdb) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Do you have any idea what might be going on? >>>>>>>>>> >>>>>>>>>> With regards, >>>>>>>>>> Petr Velan >>>>>>>>>> _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
