https://bugs.kde.org/show_bug.cgi?id=458915
Libor Peltan changed:
What|Removed |Added
Status|RESOLVED|VERIFIED
--- Comment #28 from Libor Peltan ---
https://bugs.kde.org/show_bug.cgi?id=458915
Philippe Waroquiers changed:
What|Removed |Added
Resolution|--- |FIXED
Status|REPORTED
https://bugs.kde.org/show_bug.cgi?id=458915
David Vasek changed:
What|Removed |Added
CC||david.va...@nic.cz
--- Comment #26 from David
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #25 from Philippe Waroquiers ---
(In reply to Libor Peltan from comment #24)
> You will probably need to run the test several time until it reproduces. It
> may also happen on some machines that it does never reproduce. For me, it
> easily
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #24 from Libor Peltan ---
Hi Philippe,
if you have a mood to try out with Knot DNS, these are the instruction to
reproduce:
1) download Knot DNS sources from https://gitlab.nic.cz/knot/knot-dns . It's OK
to stay at the `master` branch.
2)
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #23 from Philippe Waroquiers ---
(In reply to Libor Peltan from comment #22)
> (In reply to Philippe Waroquiers from comment #21)
> > Valgrind should stop by itself when it finds an error (when using
> > --vgdb-error argument)
>
> The
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #22 from Libor Peltan ---
(In reply to Philippe Waroquiers from comment #21)
> Valgrind should stop by itself when it finds an error (when using
> --vgdb-error argument)
The error mentioned by me is an error in application logic. Valgrind
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #21 from Philippe Waroquiers ---
(In reply to Libor Peltan from comment #20)
> Thank you for your observations! Based on this, we actually found out that
> the issue happens exactly (sometimes!) when we attach vgdb to the running
> process,
https://bugs.kde.org/show_bug.cgi?id=458915
Libor Peltan changed:
What|Removed |Added
Summary|syscall sometimes returns |syscall sometimes returns
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #20 from Libor Peltan ---
Thank you for your observations! Based on this, we actually found out that the
issue happens exactly (sometimes!) when we attach vgdb to the running process,
like this:
```
/usr/bin/gdb -ex "set confirm off" -ex
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #19 from Philippe Waroquiers ---
I took a look at the attached logs.
A first observation:
* We have 2 groups of 3 threads that get the 0xe8 syscall return.
* For each of these 2 groups, we see a little bit before these 0xe8 return that
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #18 from Libor Peltan ---
Created attachment 152437
--> https://bugs.kde.org/attachment.cgi?id=152437=edit
Logs of wrong syscall retvals with no signals observed.
--
You are receiving this mail because:
You are watching all bug changes.
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #17 from Libor Peltan ---
Thank you much for looking at this issue constantly!
I confirm that with --tool=none, the issue reproduces as well. This is also
true for --tool=helgrind, as I said earlier.
Even with
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #16 from Philippe Waroquiers ---
In one of the trace I see the below trace. It looks like the a signal SIGALRM
is delivered to the thread that encounters the futex 202 result.
--24048-- async signal handler: signal=14, vgtid=24051, tid=4,
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #15 from Paul Floyd ---
OK so it looks like a problem with
VG_(fixup_guest_state_after_syscall_interrupted) as described by Philippe
--
You are receiving this mail because:
You are watching all bug changes.
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #14 from Libor Peltan ---
Created attachment 152228
--> https://bugs.kde.org/attachment.cgi?id=152228=edit
Strace failing syscalls from a reproducing scenario.
--
You are receiving this mail because:
You are watching all bug changes.
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #13 from Libor Peltan ---
(In reply to Paul Floyd from comment #12)
> And do you see any ERESTARTNOINTR with strace?
No. I see only those return error from any failed syscalls:
EAGAIN
ECONNREFUSED
EEXIST
EINPROGRESS
EINTR
EINVAL
ENOENT
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #12 from Paul Floyd ---
And do you see any ERESTARTNOINTR with strace?
--
You are receiving this mail because:
You are watching all bug changes.
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #11 from Libor Peltan ---
Created attachment 152221
--> https://bugs.kde.org/attachment.cgi?id=152221=edit
The log of valgrind returning wrong syscall value five times.
--
You are receiving this mail because:
You are watching all bug
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #10 from Libor Peltan ---
@Philippe Thanks much for your deep analysis! I wouldn't be able to see such
things in conext. However, I think you are not entirely correct. I think the
observed bug is not caused by other thread aborting. The
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #9 from Paul Floyd ---
(In reply to Philippe Waroquiers from comment #8)
> So, an hypothesis about what happens:
> * the application encounters an error condition (in tid 18 in the epoll
> case, in tid 11 in the futex case)
> * this
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #8 from Philippe Waroquiers ---
I took a look at both logs.
First the epoll log.
(tid is the an thread id number used internally in valgrind)
What we see is that the tid 14 is just getting the result of a previous epoll
syscall, and then
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #7 from Libor Peltan ---
Created attachment 152095
--> https://bugs.kde.org/attachment.cgi?id=152095=edit
The log of valgrind crashing knotd after mishandled epoll_wait syscall.
--
You are receiving this mail because:
You are watching
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #6 from Libor Peltan ---
I frankly don't understand the flags of futex syscall. This is what glibc does.
To possibly bring more light to the issue, I'm going to upload a log from
completely different reproducer.
This time, the affected
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #5 from Paul Floyd ---
Thje pair of entry/exit traces are
SYSCALL[260041,11](202) sys_futex ( 0x1ffefff9f4, 393, 0, 0x2633d890, 0x0 ) -->
[async] ...
SYSCALL[260041,11](202) ... [async] --> Success(0xca)
the futex of is 393 or 0x189
That's
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #4 from Libor Peltan ---
Created attachment 152081
--> https://bugs.kde.org/attachment.cgi?id=152081=edit
The log of valgrind crashing glibc by mishandled futex syscall.
--
You are receiving this mail because:
You are watching all bug
https://bugs.kde.org/show_bug.cgi?id=458915
--- Comment #3 from Libor Peltan ---
@Paul
Helgrind reports oh so many errors, I can hardly find any clue in them. I guess
this tool is not intended for debugging (mostly) lockless multi-threaded
programs (where concurrency is controlled by libRCU or
https://bugs.kde.org/show_bug.cgi?id=458915
Philippe Waroquiers changed:
What|Removed |Added
CC||philippe.waroquiers@skynet.
https://bugs.kde.org/show_bug.cgi?id=458915
Paul Floyd changed:
What|Removed |Added
CC||pjfl...@wanadoo.fr
--- Comment #1 from Paul Floyd
29 matches
Mail list logo