I’ve attached an alternative diff, which is to disable the optimization in
ibis::category::fillIndex for when every entry has the same value.  This
also resolves my deadlock issues.


On 11/1/14, 8:36 PM, "Enns, Steven" <[email protected]> wrote:

>Attached is my proposed fix using recursive mutex that is confirmed to
>resolve deadlock.
>
>The deadlock only seems to occur when the column contains a single
>distinct value, so dictionary is of size 1, and the following conditional
>runs in ibis::category::fillIndex:
>
>if (dic.size() == 1) { // assume every entry has the given value
>    rlc = new ibis::direkte(this, 1, thePart->nRows());
>    }
>
>
>On 11/1/14, 4:21 PM, "Enns, Steven" <[email protected]> wrote:
>
>>I believe I have identified the cause of deadlock.
>>ibis::category::prepare acquires ibis::column::mutex.  Then it calls
>>ibis::category::fillRows, which constructs ibis::direkte::direkte, which
>>attempts to acquire the column mutex again.  Perhaps ibis::column::mutex
>>should be initialized with PTHREAD_MUTEX_RECURSIVE?
>>
>>
>>On 11/1/14, 3:53 PM, "Enns, Steven" <[email protected]> wrote:
>>
>>>Hey Sean,
>>>
>>>What specifically was wrong with your index data?  I am experiencing the
>>>same issue.  
>>>
>>>Thanks,
>>>Steve
>>>
>>>On 3/15/14, 4:08 PM, "Sean McNamara" <[email protected]>
>>>wrote:
>>>
>>>>John-
>>>>
>>>>There was an issue with the index data that we had generated. After
>>>>rebuilding the indexes /w 1.3.9 everything works great!
>>>>
>>>>Sorry for the false alarm.
>>>>
>>>>Thanks,
>>>>
>>>>Sean
>>>>
>>>>
>>>>________________________________________
>>>>From: [email protected]
>>>>[[email protected]] on behalf of K. John Wu
>>>>[[email protected]]
>>>>Sent: Saturday, March 15, 2014 1:52 AM
>>>>To: FastBit Users
>>>>Subject: Re: [FastBit-users] fastbit query hangs on FUTEX_WAIT_PRIVATE
>>>>
>>>>Hi, Sean,
>>>>
>>>>Please check out SVN revision 706 and give it a try.  Let us know if
>>>>you continue to encounter problems.
>>>>
>>>>John
>>>>
>>>>PS: You can use the following command line to check out the latest
>>>>code from SVN
>>>>
>>>>svn checkout https://codeforge.lbl.gov/anonscm/fastbit
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On 3/14/14, 2:26 PM, Sean McNamara wrote:
>>>>> Hey John-
>>>>>
>>>>> I just tried the 1.3.9 release and I still see the same issue.  The
>>>>> stacktrace is pasted below.  I believe it is getting stuck in
>>>>> column.cpp line 737: ibis::util::mutexLock lock(&mutex,
>>>>> "column::getNullMask");
>>>>>
>>>>> I can only reproduce the issue on ubuntu.  If I can get the issue
>>>>> reproduced on my mac /w generated data I will send it your way in
>>>>>case
>>>>> you wouldn't mind examining.  Btw- I am building with c++0x instead
>>>>>of
>>>>> c++0x11 on ubuntu since it has an older gcc that doesn't support
>>>>>0x11.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>> #0  0x00007ffff59d989c in __lll_lock_wait () from
>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> (gdb) bt
>>>>> #0  0x00007ffff59d989c in __lll_lock_wait () from
>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> #1  0x00007ffff59d5065 in _L_lock_858 () from
>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> #2  0x00007ffff59d4eba in pthread_mutex_lock () from
>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> #3  0x00007ffff708e613 in ibis::column::getNullMask(ibis::bitvector&)
>>>>> const () from /usr/local/lib/libfastbit.so.0
>>>>> #4  0x00007ffff79573d0 in ibis::direkte::direkte(ibis::column const*,
>>>>> unsigned int, unsigned int) ()
>>>>>    from /usr/local/lib/libfastbit.so.0
>>>>> #5  0x00007ffff77f2ec3 in ibis::category::fillIndex(char const*)
>>>>>const
>>>>> () from /usr/local/lib/libfastbit.so.0
>>>>> #6  0x00007ffff77f6c68 in ibis::category::prepareMembers() const ()
>>>>> from /usr/local/lib/libfastbit.so.0
>>>>> #7  0x00007ffff77fbd85 in ibis::category::getDictionary() const ()
>>>>> from /usr/local/lib/libfastbit.so.0
>>>>> #8  0x00007ffff6fe085d in ibis::bord::bord(char const*, char const*,
>>>>> ibis::selectClause const&, std::vector<ibis::part const*,
>>>>> std::allocator<ibis::part const*> > const&) () from
>>>>> /usr/local/lib/libfastbit.so.0
>>>>> #9  0x00007ffff78b0c69 in ibis::filter::sift2(ibis::selectClause
>>>>> const&, std::vector<ibis::part const*, std::allocator<ibis::part
>>>>> const*> > const&, ibis::whereClause const&) () from
>>>>> /usr/local/lib/libfastbit.so.0
>>>>> #10 0x00007ffff78b8c28 in ibis::table::select(std::vector<ibis::part
>>>>> const*, std::allocator<ibis::part const*> > const&, char const*, char
>>>>> const*) () from /usr/local/lib/libfastbit.so.0
>>>>> #11 0x00007ffff771513b in ibis::mensa::select(char const*, char
>>>>> const*) const () from /usr/local/lib/libfastbit.so.0
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 
>>>>>----------------------------------------------------------------------
>>>>> *From:* [email protected]
>>>>> [[email protected]] on behalf of Sean McNamara
>>>>> [[email protected]]
>>>>> *Sent:* Friday, March 14, 2014 12:19 PM
>>>>> *To:* FastBit Users
>>>>> *Subject:* Re: [FastBit-users] fastbit query hangs on
>>>>>FUTEX_WAIT_PRIVATE
>>>>>
>>>>> John-
>>>>>
>>>>> Unfortunately I cannot share this dataset.  I may try to make a
>>>>> dataset that I can share if I can repo the issue.
>>>>>
>>>>> In case it is helpful here is a stacktrace:
>>>>> http://pastebin.com/FT3qsLH6
>>>>>
>>>>> I tried pulling the data down to my local machine and it works fine
>>>>> there, no issues whatsoever. (I have a newer version of fastbit
>>>>> installed locally).  So first I will try deploying the latest and
>>>>> greatest on our cluster. I will let you know how that goes.
>>>>>
>>>>> Thanks again!
>>>>>
>>>>> Sean
>>>>>
>>>>> 
>>>>>----------------------------------------------------------------------
>>>>> *From:* [email protected]
>>>>> [[email protected]] on behalf of John
>>>>>[[email protected]]
>>>>> *Sent:* Friday, March 14, 2014 12:04 PM
>>>>> *To:* FastBit Users
>>>>> *Subject:* Re: [FastBit-users] fastbit query hangs on
>>>>>FUTEX_WAIT_PRIVATE
>>>>>
>>>>> Hi, Sean,
>>>>>
>>>>> Thanks for bring this issue up.  It appears to be some sort of
>>>>> deadlock.  I could look into further if you can share the sample
>>>>>data.
>>>>>  Is the link you give the data or the log messages?
>>>>>
>>>>> -- John --
>>>>>
>>>>> On Mar 14, 2014, at 10:53 AM, Sean McNamara
>>>>> <[email protected] <mailto:[email protected]>>
>>>>>wrote:
>>>>>
>>>>>> Hi-
>>>>>>
>>>>>> I¹m trying to troubleshoot an issue that I just started seeing.
>>>>>>  Queries seem to hang, but only for certain columns and it¹s not
>>>>>> clear to me why.  If it¹s any help, I am using fastbit a few commits
>>>>>> after 692.
>>>>>>
>>>>>> Here is the strace for the query:
>>>>>>
>>>>>> strace ibis -d /mnt/data/test -q "select daily_binned_datetime²
>>>>>>
>>>>>> http://pastebin.com/xczKJVWL
>>>>>>
>>>>>>
>>>>>> Here is the tail of what ibis is doing with verbosity:
>>>>>>
>>>>>> fileManager::storage(0x258e630, 0) cleared
>>>>>> array_t<i>::freeMemory this=0x24421a0 actual=0x24515f0 and m_begin=0
>>>>>> (active references: 0, past references: 1)
>>>>>> fileManager::storage(0x24515f0, 0) cleared
>>>>>> fileManager::flushFile will do nothing because
>>>>>> 
>>>>>>"/mnt/data/explore/keyidx/35000/rp13/2014/02/03/daily_binned_datetime
>>>>>>.
>>>>>>i
>>>>>>d
>>>>>>x"
>>>>>> is not tracked by the file manager
>>>>>> fileManager::storage(0x24515f0, 0) initialization completed
>>>>>> array_t<i> constructed at 0x2451350 with actual=0x24515f0, m_begin=0
>>>>>> and m_end=0
>>>>>> fileManager::storage(0x258e630, 0) initialization completed
>>>>>> array_t<l> constructed at 0x2451368 with actual=0x258e630, m_begin=0
>>>>>> and m_end=0
>>>>>> fileManager::storage(0x2451170, 0) initialization completed
>>>>>> array_t<PN4ibis9bitvectorE> constructed at 0x2451380 with
>>>>>> actual=0x2451170, m_begin=0 and m_end=0
>>>>>> array_t<PN4ibis9bitvectorE>::freeMemory this=0x2451380
>>>>>> actual=0x2451170 and m_begin=0 (active references: 0, past
>>>>>> references: 1)
>>>>>> fileManager::storage(0x2451170, 0) cleared
>>>>>> fileManager::storage(0x2451170, 0x2451290) added 16 bytes to
>>>>>> increase totalBytes to 80192
>>>>>> fileManager::storage(0x2451170, 0x2451290) initialization completed
>>>>>> with 16 elements
>>>>>> fileManager::storage(0x24512e0, 0) initialization completed
>>>>>> array_t<j> constructed at 0x24512c0 with actual=0x24512e0, m_begin=0
>>>>>> and m_end=0
>>>>>> bitvector (0x24512b0) constructed with m_vec at 0x24512c0   <‹ hangs
>>>>>> here
>>>>>>
>>>>>>
>>>>>> Does anyone have any insight?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>> _______________________________________________
>>>>>> FastBit-users mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>
>>>>_______________________________________________
>>>>FastBit-users mailing list
>>>>[email protected]
>>>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>_______________________________________________
>>>>FastBit-users mailing list
>>>>[email protected]
>>>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
>>>_______________________________________________
>>>FastBit-users mailing list
>>>[email protected]
>>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>
>>_______________________________________________
>>FastBit-users mailing list
>>[email protected]
>>https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>

Attachment: category.cpp.diff
Description: category.cpp.diff

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to