Attached is my proposed fix using recursive mutex that is confirmed to
resolve deadlock.

The deadlock only seems to occur when the column contains a single
distinct value, so dictionary is of size 1, and the following conditional
runs in ibis::category::fillIndex:

if (dic.size() == 1) { // assume every entry has the given value
    rlc = new ibis::direkte(this, 1, thePart->nRows());
    }


On 11/1/14, 4:21 PM, "Enns, Steven" <[email protected]> wrote:

>I believe I have identified the cause of deadlock.
>ibis::category::prepare acquires ibis::column::mutex.  Then it calls
>ibis::category::fillRows, which constructs ibis::direkte::direkte, which
>attempts to acquire the column mutex again.  Perhaps ibis::column::mutex
>should be initialized with PTHREAD_MUTEX_RECURSIVE?
>
>
>On 11/1/14, 3:53 PM, "Enns, Steven" <[email protected]> wrote:
>
>>Hey Sean,
>>
>>What specifically was wrong with your index data?  I am experiencing the
>>same issue.  
>>
>>Thanks,
>>Steve
>>
>>On 3/15/14, 4:08 PM, "Sean McNamara" <[email protected]> wrote:
>>
>>>John-
>>>
>>>There was an issue with the index data that we had generated. After
>>>rebuilding the indexes /w 1.3.9 everything works great!
>>>
>>>Sorry for the false alarm.
>>>
>>>Thanks,
>>>
>>>Sean
>>>
>>>
>>>________________________________________
>>>From: [email protected]
>>>[[email protected]] on behalf of K. John Wu
>>>[[email protected]]
>>>Sent: Saturday, March 15, 2014 1:52 AM
>>>To: FastBit Users
>>>Subject: Re: [FastBit-users] fastbit query hangs on FUTEX_WAIT_PRIVATE
>>>
>>>Hi, Sean,
>>>
>>>Please check out SVN revision 706 and give it a try.  Let us know if
>>>you continue to encounter problems.
>>>
>>>John
>>>
>>>PS: You can use the following command line to check out the latest
>>>code from SVN
>>>
>>>svn checkout https://codeforge.lbl.gov/anonscm/fastbit
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 3/14/14, 2:26 PM, Sean McNamara wrote:
>>>> Hey John-
>>>>
>>>> I just tried the 1.3.9 release and I still see the same issue.  The
>>>> stacktrace is pasted below.  I believe it is getting stuck in
>>>> column.cpp line 737: ibis::util::mutexLock lock(&mutex,
>>>> "column::getNullMask");
>>>>
>>>> I can only reproduce the issue on ubuntu.  If I can get the issue
>>>> reproduced on my mac /w generated data I will send it your way in case
>>>> you wouldn't mind examining.  Btw- I am building with c++0x instead of
>>>> c++0x11 on ubuntu since it has an older gcc that doesn't support 0x11.
>>>>
>>>> Thanks again,
>>>>
>>>> Sean
>>>>
>>>>
>>>> #0  0x00007ffff59d989c in __lll_lock_wait () from
>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>> (gdb) bt
>>>> #0  0x00007ffff59d989c in __lll_lock_wait () from
>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>> #1  0x00007ffff59d5065 in _L_lock_858 () from
>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>> #2  0x00007ffff59d4eba in pthread_mutex_lock () from
>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>> #3  0x00007ffff708e613 in ibis::column::getNullMask(ibis::bitvector&)
>>>> const () from /usr/local/lib/libfastbit.so.0
>>>> #4  0x00007ffff79573d0 in ibis::direkte::direkte(ibis::column const*,
>>>> unsigned int, unsigned int) ()
>>>>    from /usr/local/lib/libfastbit.so.0
>>>> #5  0x00007ffff77f2ec3 in ibis::category::fillIndex(char const*) const
>>>> () from /usr/local/lib/libfastbit.so.0
>>>> #6  0x00007ffff77f6c68 in ibis::category::prepareMembers() const ()
>>>> from /usr/local/lib/libfastbit.so.0
>>>> #7  0x00007ffff77fbd85 in ibis::category::getDictionary() const ()
>>>> from /usr/local/lib/libfastbit.so.0
>>>> #8  0x00007ffff6fe085d in ibis::bord::bord(char const*, char const*,
>>>> ibis::selectClause const&, std::vector<ibis::part const*,
>>>> std::allocator<ibis::part const*> > const&) () from
>>>> /usr/local/lib/libfastbit.so.0
>>>> #9  0x00007ffff78b0c69 in ibis::filter::sift2(ibis::selectClause
>>>> const&, std::vector<ibis::part const*, std::allocator<ibis::part
>>>> const*> > const&, ibis::whereClause const&) () from
>>>> /usr/local/lib/libfastbit.so.0
>>>> #10 0x00007ffff78b8c28 in ibis::table::select(std::vector<ibis::part
>>>> const*, std::allocator<ibis::part const*> > const&, char const*, char
>>>> const*) () from /usr/local/lib/libfastbit.so.0
>>>> #11 0x00007ffff771513b in ibis::mensa::select(char const*, char
>>>> const*) const () from /usr/local/lib/libfastbit.so.0
>>>>
>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>>> *From:* [email protected]
>>>> [[email protected]] on behalf of Sean McNamara
>>>> [[email protected]]
>>>> *Sent:* Friday, March 14, 2014 12:19 PM
>>>> *To:* FastBit Users
>>>> *Subject:* Re: [FastBit-users] fastbit query hangs on
>>>>FUTEX_WAIT_PRIVATE
>>>>
>>>> John-
>>>>
>>>> Unfortunately I cannot share this dataset.  I may try to make a
>>>> dataset that I can share if I can repo the issue.
>>>>
>>>> In case it is helpful here is a stacktrace:
>>>> http://pastebin.com/FT3qsLH6
>>>>
>>>> I tried pulling the data down to my local machine and it works fine
>>>> there, no issues whatsoever. (I have a newer version of fastbit
>>>> installed locally).  So first I will try deploying the latest and
>>>> greatest on our cluster. I will let you know how that goes.
>>>>
>>>> Thanks again!
>>>>
>>>> Sean
>>>>
>>>> ----------------------------------------------------------------------
>>>> *From:* [email protected]
>>>> [[email protected]] on behalf of John [[email protected]]
>>>> *Sent:* Friday, March 14, 2014 12:04 PM
>>>> *To:* FastBit Users
>>>> *Subject:* Re: [FastBit-users] fastbit query hangs on
>>>>FUTEX_WAIT_PRIVATE
>>>>
>>>> Hi, Sean,
>>>>
>>>> Thanks for bring this issue up.  It appears to be some sort of
>>>> deadlock.  I could look into further if you can share the sample data.
>>>>  Is the link you give the data or the log messages?
>>>>
>>>> -- John --
>>>>
>>>> On Mar 14, 2014, at 10:53 AM, Sean McNamara
>>>> <[email protected] <mailto:[email protected]>>
>>>>wrote:
>>>>
>>>>> Hi-
>>>>>
>>>>> I¹m trying to troubleshoot an issue that I just started seeing.
>>>>>  Queries seem to hang, but only for certain columns and it¹s not
>>>>> clear to me why.  If it¹s any help, I am using fastbit a few commits
>>>>> after 692.
>>>>>
>>>>> Here is the strace for the query:
>>>>>
>>>>> strace ibis -d /mnt/data/test -q "select daily_binned_datetime²
>>>>>
>>>>> http://pastebin.com/xczKJVWL
>>>>>
>>>>>
>>>>> Here is the tail of what ibis is doing with verbosity:
>>>>>
>>>>> fileManager::storage(0x258e630, 0) cleared
>>>>> array_t<i>::freeMemory this=0x24421a0 actual=0x24515f0 and m_begin=0
>>>>> (active references: 0, past references: 1)
>>>>> fileManager::storage(0x24515f0, 0) cleared
>>>>> fileManager::flushFile will do nothing because
>>>>> 
>>>>>"/mnt/data/explore/keyidx/35000/rp13/2014/02/03/daily_binned_datetime.
>>>>>i
>>>>>d
>>>>>x"
>>>>> is not tracked by the file manager
>>>>> fileManager::storage(0x24515f0, 0) initialization completed
>>>>> array_t<i> constructed at 0x2451350 with actual=0x24515f0, m_begin=0
>>>>> and m_end=0
>>>>> fileManager::storage(0x258e630, 0) initialization completed
>>>>> array_t<l> constructed at 0x2451368 with actual=0x258e630, m_begin=0
>>>>> and m_end=0
>>>>> fileManager::storage(0x2451170, 0) initialization completed
>>>>> array_t<PN4ibis9bitvectorE> constructed at 0x2451380 with
>>>>> actual=0x2451170, m_begin=0 and m_end=0
>>>>> array_t<PN4ibis9bitvectorE>::freeMemory this=0x2451380
>>>>> actual=0x2451170 and m_begin=0 (active references: 0, past
>>>>> references: 1)
>>>>> fileManager::storage(0x2451170, 0) cleared
>>>>> fileManager::storage(0x2451170, 0x2451290) added 16 bytes to
>>>>> increase totalBytes to 80192
>>>>> fileManager::storage(0x2451170, 0x2451290) initialization completed
>>>>> with 16 elements
>>>>> fileManager::storage(0x24512e0, 0) initialization completed
>>>>> array_t<j> constructed at 0x24512c0 with actual=0x24512e0, m_begin=0
>>>>> and m_end=0
>>>>> bitvector (0x24512b0) constructed with m_vec at 0x24512c0   <‹ hangs
>>>>> here
>>>>>
>>>>>
>>>>> Does anyone have any insight?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Sean
>>>>>
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected] <mailto:[email protected]>
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>
>>>>
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>
>>>_______________________________________________
>>>FastBit-users mailing list
>>>[email protected]
>>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>_______________________________________________
>>>FastBit-users mailing list
>>>[email protected]
>>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>
>>_______________________________________________
>>FastBit-users mailing list
>>[email protected]
>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
>_______________________________________________
>FastBit-users mailing list
>[email protected]
>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Attachment: column.diff
Description: column.diff

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to