On 20/03/2015 23:24, Adrian M Negreanu wrote:
> Given that it's a power8 CPU (SMT), can this be triggered by an instruction 
> reordering
> somewhere in qwaitcondition_unix.cpp ? Maybe -O0 can help ?

I did a trial with -00 for all (not only qwaitcondition_unix.cpp) and it still 
failed.

> 
> Also valgrind, besides the fact that slows the execution, it also modifies 
> the process instructions.
> 
> 
> On Thu, Mar 19, 2015 at 11:04 PM, Dimitri van Heesch <doxy...@gmail.com 
> <mailto:doxy...@gmail.com>> wrote:
> 
>     Hi Normand,
> 
>     The issues seems to be in this piece of code:
> 
>     DotRunner *DotRunnerQueue::dequeue()
>     {
>       QMutexLocker locker(&m_mutex);
>       while (m_queue.isEmpty())
>       {
>         // wait until something is added to the queue
>         m_bufferNotEmpty.wait(&m_mutex);
>       }
>       DotRunner *result = m_queue.dequeue();
>       return result;
>     }
> 
>     It is one of the few areas that executed by multiple threads,
>     but it is protected by a mutex (under the hood the QMutex and 
> QWaitCondition map to pthread calls).
>     Since m_bufferNotEmpty has its own mutex internally, it should only allow
>     one thread to be awaken. What you are seeing, it seems, is two threads 
> doing a dequeue() simultaneously.
>     Would be nice if you could help me with debugging this issue.

I do not know how to debug this, adding printf do not help me to isolate a 
problem.
Any suggestions ?
===
     // wait until something is added to the queue
     m_bufferNotEmpty.wait(&m_mutex);
   }
+  pthread_t id = pthread_self();
+  printf("%08x: %p: DotRunnerQueue::dequeue, m_queue %p\n",id, this, m_queue);
   DotRunner *result = m_queue.dequeue();
   return result;
 }
===
...
ad0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ac0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
Running dot for graph 309/1586
Running dot for graph 310/1586
ac8df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
Running dot for graph 311/1586
ab0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ab8df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ad0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
Running dot for graph 312/1586
Running dot for graph 313/1586
Running dot for graph 314/1586
ac0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
Running dot for graph 315/1586
ab0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ac0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ac8df190: 0x1002f8d0460: DotRunner
===

> 
>     A workaround is to set DOT_NUM_THREADS to 1.

The workaround is what is implemented today for openSUSE tumbleweed for ppc64le.

> 
>     Regards,
>       Dimitri
> 
>     > On 18 Mar 2015, at 12:45 , Normand <norm...@linux.vnet.ibm.com 
> <mailto:norm...@linux.vnet.ibm.com>> wrote:
>     >
>     >
>     > On 11/03/2015 14:16, Normand wrote:
>     >> Hi there
>     >>
>     >> while building doxygen for opensuse on Power8 guest I hit a failure as 
> detailed in (2)
>     >> The related backtrace extracted for core file is appended below in (1)
>     >>
>     >>
>     >> === (1)
>     >> Core was generated by `./bin/doxygen '.
>     >> Program terminated with signal SIGABRT, Aborted.
>     >> #0  0x00003fffa5acd194 in __GI_raise (sig=<optimized out>) at 
> ../sysdeps/unix/sysv/linux/raise.c:55
>     >> 55      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>     >> Missing separate debuginfos, use: zypper install 
> libgcc_s1-debuginfo-4.8.3+r218481-2.1.ppc64le 
> libstdc++6-debuginfo-4.8.3+r218481-2.1.ppc64le
>     >> (gdb) bt
>     >> #0  0x00003fffa5acd194 in __GI_raise (sig=<optimized out>) at 
> ../sysdeps/unix/sysv/linux/raise.c:55
>     >> #1  0x00003fffa5acf184 in __GI_abort () at abort.c:78
>     >> #2  0x00003fffa5b136c4 in __libc_message (do_abort=<optimized out>, 
> fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175
>     >> #3  0x00003fffa5b1ba84 in malloc_printerr (action=<optimized out>, 
> str=0x3fffa5c06b50 "double free or corruption (fasttop)", ptr=<optimized 
> out>) at malloc.c:4960
>     >> #4  0x00003fffa5b1cadc in _int_free (av=<optimized out>, p=<optimized 
> out>, have_lock=<optimized out>) at malloc.c:3831
>     >> #5  0x00003fffa5dece10 in operator delete(void*) () from 
> /usr/lib64/libstdc++.so.6
>     >> #6  0x00000000106620e4 in QGList::takeFirst (this=<optimized out>) at 
> qglist.cpp:628
>     >> #7  0x000000001053ba84 in dequeue (this=<optimized out>) at 
> ../qtools/qqueue.h:59
>     >> #8  DotRunnerQueue::dequeue (this=0x1001910fcc0) at dot.cpp:1170
>     >> #9  0x000000001053bb18 in DotWorkerThread::run (this=0x10019112a50) at 
> dot.cpp:1191
>     >> #10 0x00000000106a0a44 in QThreadPrivate::start (arg=0x10019112a50) at 
> qthread_unix.cpp:87
>     >> #11 0x00003fffa5ee9454 in start_thread (arg=0x3fffa38bf180) at 
> pthread_create.c:335
>     >> #12 0x00003fffa5b9e0c4 in clone () at 
> ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96
>     >> ===
>     >>
>     >> (2) https://bugzilla.suse.com/show_bug.cgi?id=921577
>     >>
>     >
>     >
>     >
>     > I was able to recreate the problem with doxygen last git commit 1c8bbb6
>     > *   1c8bbb6 (HEAD, origin/master, origin/HEAD, master) Merge pull 
> request #314
>     >
>     > The associated backtrace (from core file) only differ from above by 
> some line numbers
>     > But is still pointing to same call sequence:
>     > from DotWorkerThread::run
>     > to delete in QCollection::Item QGList::takeFirst
>     > ===
>     > #0  0x00003fff8433d194 in raise () from /lib64/libc.so.6
>     > Missing separate debuginfos, use: zypper install 
> glibc-debuginfo-2.21-3.3.ppc64le 
> libgcc_s1-debuginfo-4.8.3+r218481-4.3.ppc64le 
> libstdc++6-debuginfo-4.8.3+r218481-4.3.ppc64le
>     > (gdb) bt
>     > #0  0x00003fff8433d194 in raise () from /lib64/libc.so.6
>     > #1  0x00003fff8433f184 in abort () from /lib64/libc.so.6
>     > #2  0x00003fff843836c4 in __libc_message () from /lib64/libc.so.6
>     > #3  0x00003fff8438ba84 in malloc_printerr () from /lib64/libc.so.6
>     > #4  0x00003fff8438cadc in _int_free () from /lib64/libc.so.6
>     > #5  0x00003fff8465ce10 in operator delete(void*) () from 
> /usr/lib64/libstdc++.so.6
>     > #6  0x000000001066c5e4 in QGList::takeFirst (this=<optimized out>) at 
> qglist.cpp:628
>     > #7  0x0000000010544e04 in dequeue (this=<optimized out>) at 
> ../qtools/qqueue.h:59
>     > #8  DotRunnerQueue::dequeue (this=0x1001707f7b0) at dot.cpp:1181
>     > #9  0x0000000010544e98 in DotWorkerThread::run (this=0x1001707efd0) at 
> dot.cpp:1202
>     > #10 0x00000000106aaf44 in QThreadPrivate::start (arg=0x1001707efd0) at 
> qthread_unix.cpp:87
>     > #11 0x00003fff84759454 in start_thread () from /lib64/libpthread.so.0
>     > #12 0x00003fff8440e0c4 in clone () from /lib64/libc.so.6
>     > ===
>     >
>     > The occurence is timing dependent, and there is no failure if trying to 
> start doxygen via gdb or valgrind,
>     > so I do not know how to continue investigation.
>     >
>     > any suggestions are welcome.
>     >
>     > ---
>     > Michel Normand
>     >


-- 
Michel Normand


------------------------------------------------------------------------------
_______________________________________________
Doxygen-users mailing list
Doxygen-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/doxygen-users

Reply via email to