I’ve attached an alternative diff, which is to disable the optimization in ibis::category::fillIndex for when every entry has the same value. This also resolves my deadlock issues.
On 11/1/14, 8:36 PM, "Enns, Steven" <[email protected]> wrote: >Attached is my proposed fix using recursive mutex that is confirmed to >resolve deadlock. > >The deadlock only seems to occur when the column contains a single >distinct value, so dictionary is of size 1, and the following conditional >runs in ibis::category::fillIndex: > >if (dic.size() == 1) { // assume every entry has the given value > rlc = new ibis::direkte(this, 1, thePart->nRows()); > } > > >On 11/1/14, 4:21 PM, "Enns, Steven" <[email protected]> wrote: > >>I believe I have identified the cause of deadlock. >>ibis::category::prepare acquires ibis::column::mutex. Then it calls >>ibis::category::fillRows, which constructs ibis::direkte::direkte, which >>attempts to acquire the column mutex again. Perhaps ibis::column::mutex >>should be initialized with PTHREAD_MUTEX_RECURSIVE? >> >> >>On 11/1/14, 3:53 PM, "Enns, Steven" <[email protected]> wrote: >> >>>Hey Sean, >>> >>>What specifically was wrong with your index data? I am experiencing the >>>same issue. >>> >>>Thanks, >>>Steve >>> >>>On 3/15/14, 4:08 PM, "Sean McNamara" <[email protected]> >>>wrote: >>> >>>>John- >>>> >>>>There was an issue with the index data that we had generated. After >>>>rebuilding the indexes /w 1.3.9 everything works great! >>>> >>>>Sorry for the false alarm. >>>> >>>>Thanks, >>>> >>>>Sean >>>> >>>> >>>>________________________________________ >>>>From: [email protected] >>>>[[email protected]] on behalf of K. John Wu >>>>[[email protected]] >>>>Sent: Saturday, March 15, 2014 1:52 AM >>>>To: FastBit Users >>>>Subject: Re: [FastBit-users] fastbit query hangs on FUTEX_WAIT_PRIVATE >>>> >>>>Hi, Sean, >>>> >>>>Please check out SVN revision 706 and give it a try. Let us know if >>>>you continue to encounter problems. >>>> >>>>John >>>> >>>>PS: You can use the following command line to check out the latest >>>>code from SVN >>>> >>>>svn checkout https://codeforge.lbl.gov/anonscm/fastbit >>>> >>>> >>>> >>>> >>>> >>>> >>>>On 3/14/14, 2:26 PM, Sean McNamara wrote: >>>>> Hey John- >>>>> >>>>> I just tried the 1.3.9 release and I still see the same issue. The >>>>> stacktrace is pasted below. I believe it is getting stuck in >>>>> column.cpp line 737: ibis::util::mutexLock lock(&mutex, >>>>> "column::getNullMask"); >>>>> >>>>> I can only reproduce the issue on ubuntu. If I can get the issue >>>>> reproduced on my mac /w generated data I will send it your way in >>>>>case >>>>> you wouldn't mind examining. Btw- I am building with c++0x instead >>>>>of >>>>> c++0x11 on ubuntu since it has an older gcc that doesn't support >>>>>0x11. >>>>> >>>>> Thanks again, >>>>> >>>>> Sean >>>>> >>>>> >>>>> #0 0x00007ffff59d989c in __lll_lock_wait () from >>>>> /lib/x86_64-linux-gnu/libpthread.so.0 >>>>> (gdb) bt >>>>> #0 0x00007ffff59d989c in __lll_lock_wait () from >>>>> /lib/x86_64-linux-gnu/libpthread.so.0 >>>>> #1 0x00007ffff59d5065 in _L_lock_858 () from >>>>> /lib/x86_64-linux-gnu/libpthread.so.0 >>>>> #2 0x00007ffff59d4eba in pthread_mutex_lock () from >>>>> /lib/x86_64-linux-gnu/libpthread.so.0 >>>>> #3 0x00007ffff708e613 in ibis::column::getNullMask(ibis::bitvector&) >>>>> const () from /usr/local/lib/libfastbit.so.0 >>>>> #4 0x00007ffff79573d0 in ibis::direkte::direkte(ibis::column const*, >>>>> unsigned int, unsigned int) () >>>>> from /usr/local/lib/libfastbit.so.0 >>>>> #5 0x00007ffff77f2ec3 in ibis::category::fillIndex(char const*) >>>>>const >>>>> () from /usr/local/lib/libfastbit.so.0 >>>>> #6 0x00007ffff77f6c68 in ibis::category::prepareMembers() const () >>>>> from /usr/local/lib/libfastbit.so.0 >>>>> #7 0x00007ffff77fbd85 in ibis::category::getDictionary() const () >>>>> from /usr/local/lib/libfastbit.so.0 >>>>> #8 0x00007ffff6fe085d in ibis::bord::bord(char const*, char const*, >>>>> ibis::selectClause const&, std::vector<ibis::part const*, >>>>> std::allocator<ibis::part const*> > const&) () from >>>>> /usr/local/lib/libfastbit.so.0 >>>>> #9 0x00007ffff78b0c69 in ibis::filter::sift2(ibis::selectClause >>>>> const&, std::vector<ibis::part const*, std::allocator<ibis::part >>>>> const*> > const&, ibis::whereClause const&) () from >>>>> /usr/local/lib/libfastbit.so.0 >>>>> #10 0x00007ffff78b8c28 in ibis::table::select(std::vector<ibis::part >>>>> const*, std::allocator<ibis::part const*> > const&, char const*, char >>>>> const*) () from /usr/local/lib/libfastbit.so.0 >>>>> #11 0x00007ffff771513b in ibis::mensa::select(char const*, char >>>>> const*) const () from /usr/local/lib/libfastbit.so.0 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>---------------------------------------------------------------------- >>>>> *From:* [email protected] >>>>> [[email protected]] on behalf of Sean McNamara >>>>> [[email protected]] >>>>> *Sent:* Friday, March 14, 2014 12:19 PM >>>>> *To:* FastBit Users >>>>> *Subject:* Re: [FastBit-users] fastbit query hangs on >>>>>FUTEX_WAIT_PRIVATE >>>>> >>>>> John- >>>>> >>>>> Unfortunately I cannot share this dataset. I may try to make a >>>>> dataset that I can share if I can repo the issue. >>>>> >>>>> In case it is helpful here is a stacktrace: >>>>> http://pastebin.com/FT3qsLH6 >>>>> >>>>> I tried pulling the data down to my local machine and it works fine >>>>> there, no issues whatsoever. (I have a newer version of fastbit >>>>> installed locally). So first I will try deploying the latest and >>>>> greatest on our cluster. I will let you know how that goes. >>>>> >>>>> Thanks again! >>>>> >>>>> Sean >>>>> >>>>> >>>>>---------------------------------------------------------------------- >>>>> *From:* [email protected] >>>>> [[email protected]] on behalf of John >>>>>[[email protected]] >>>>> *Sent:* Friday, March 14, 2014 12:04 PM >>>>> *To:* FastBit Users >>>>> *Subject:* Re: [FastBit-users] fastbit query hangs on >>>>>FUTEX_WAIT_PRIVATE >>>>> >>>>> Hi, Sean, >>>>> >>>>> Thanks for bring this issue up. It appears to be some sort of >>>>> deadlock. I could look into further if you can share the sample >>>>>data. >>>>> Is the link you give the data or the log messages? >>>>> >>>>> -- John -- >>>>> >>>>> On Mar 14, 2014, at 10:53 AM, Sean McNamara >>>>> <[email protected] <mailto:[email protected]>> >>>>>wrote: >>>>> >>>>>> Hi- >>>>>> >>>>>> I¹m trying to troubleshoot an issue that I just started seeing. >>>>>> Queries seem to hang, but only for certain columns and it¹s not >>>>>> clear to me why. If it¹s any help, I am using fastbit a few commits >>>>>> after 692. >>>>>> >>>>>> Here is the strace for the query: >>>>>> >>>>>> strace ibis -d /mnt/data/test -q "select daily_binned_datetime² >>>>>> >>>>>> http://pastebin.com/xczKJVWL >>>>>> >>>>>> >>>>>> Here is the tail of what ibis is doing with verbosity: >>>>>> >>>>>> fileManager::storage(0x258e630, 0) cleared >>>>>> array_t<i>::freeMemory this=0x24421a0 actual=0x24515f0 and m_begin=0 >>>>>> (active references: 0, past references: 1) >>>>>> fileManager::storage(0x24515f0, 0) cleared >>>>>> fileManager::flushFile will do nothing because >>>>>> >>>>>>"/mnt/data/explore/keyidx/35000/rp13/2014/02/03/daily_binned_datetime >>>>>>. >>>>>>i >>>>>>d >>>>>>x" >>>>>> is not tracked by the file manager >>>>>> fileManager::storage(0x24515f0, 0) initialization completed >>>>>> array_t<i> constructed at 0x2451350 with actual=0x24515f0, m_begin=0 >>>>>> and m_end=0 >>>>>> fileManager::storage(0x258e630, 0) initialization completed >>>>>> array_t<l> constructed at 0x2451368 with actual=0x258e630, m_begin=0 >>>>>> and m_end=0 >>>>>> fileManager::storage(0x2451170, 0) initialization completed >>>>>> array_t<PN4ibis9bitvectorE> constructed at 0x2451380 with >>>>>> actual=0x2451170, m_begin=0 and m_end=0 >>>>>> array_t<PN4ibis9bitvectorE>::freeMemory this=0x2451380 >>>>>> actual=0x2451170 and m_begin=0 (active references: 0, past >>>>>> references: 1) >>>>>> fileManager::storage(0x2451170, 0) cleared >>>>>> fileManager::storage(0x2451170, 0x2451290) added 16 bytes to >>>>>> increase totalBytes to 80192 >>>>>> fileManager::storage(0x2451170, 0x2451290) initialization completed >>>>>> with 16 elements >>>>>> fileManager::storage(0x24512e0, 0) initialization completed >>>>>> array_t<j> constructed at 0x24512c0 with actual=0x24512e0, m_begin=0 >>>>>> and m_end=0 >>>>>> bitvector (0x24512b0) constructed with m_vec at 0x24512c0 <‹ hangs >>>>>> here >>>>>> >>>>>> >>>>>> Does anyone have any insight? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Sean >>>>>> >>>>>> _______________________________________________ >>>>>> FastBit-users mailing list >>>>>> [email protected] <mailto:[email protected]> >>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>> >>>>> >>>>> _______________________________________________ >>>>> FastBit-users mailing list >>>>> [email protected] >>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>> >>>>_______________________________________________ >>>>FastBit-users mailing list >>>>[email protected] >>>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>_______________________________________________ >>>>FastBit-users mailing list >>>>[email protected] >>>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>> >>>_______________________________________________ >>>FastBit-users mailing list >>>[email protected] >>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >> >>_______________________________________________ >>FastBit-users mailing list >>[email protected] >>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >
category.cpp.diff
Description: category.cpp.diff
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
