Based on this analysis by Frédéric Bonnard (thanks a lot!) I infer
that the "sleep 5" does not fix the problem at all, so I'm going
to disable the test completely for buster, as it's clearly buggy.

----- Forwarded message from Frédéric Bonnard <fre...@debian.org> -----

Date: Wed, 27 Feb 2019 16:48:38 +0100
From: Frédéric Bonnard <fre...@debian.org>
To: Breno Leitao <lei...@debian.org>, Santiago Vila <sanv...@unex.es>, 
ppc6...@buildd.debian.org
Subject: Re: diffutils test suite

Hi,

February 18, 2019 7:04 PM, "Breno Leitao" <lei...@debian.org> wrote:

> hi Santiago,
> 
> On 2/18/19 2:46 PM, Santiago Vila wrote:
> 
>> Hello.
>> 
>> I have a problem with diffutils testsuite.
>> 
>> There is a test called "colors" which fails in ppc64el.
>> 
>> The first time this happened I added a "sleep 1" to the test
>> to avoid what it seemed to be a race condition. Now I have increased
>> the time to 5 seconds, but it still fails on powerpc and ppc64.
>> 
>> Are those architectures so slow that they need more than 5 seconds for
>> a fifo to work?
> 
> I do not think so. In fact, ppc64el seems to be the fastest builder at debian
> today.

To summarize the context, the failing test is :
--
mkfifo fifo
printf '%1000000s-a' > a
printf '%1000000s-b' > b
head -c 10 < fifo > /dev/null &
sleep 5
diff --color=always ---presume-output-tty a b > fifo
test $? = 141 || fail=1
--

The expected behaviour of diff is that is should catch the SIGPIPE when
head has read its 10bytes from the fifo given the big a and b file, and
return 141 (141 = 128 + SIGPIPE)

Quite often 141 is returned.
I straced a case where diff returns 2 (EXIT_TROUBLE) (that is our failing case) 
:
--
24846 write(1, "                                "..., 4096) = 4096
24846 write(1, "                                "..., 4096) = -1 EPIPE (Broken 
pipe)
24846 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=24846, 
si_uid=1000} ---
24846 rt_sigreturn({mask=[]})           = -1 EPIPE (Broken pipe)
24846 munmap(0x7fffa0cf0000, 262144)    = 0
24846 munmap(0x7fffa0fa0000, 262144)    = 0
24846 munmap(0x7fffa0cb0000, 262144)    = 0
24846 munmap(0x7fffa0f60000, 262144)    = 0
24846 munmap(0x7fffa10e0000, 1048576)   = 0
24846 munmap(0x7fffa0fe0000, 1048576)   = 0
24846 close(3)                          = 0
24846 close(4)                          = 0
24846 write(1, "\33[0m\n\\ No newline at end of file"..., 33) = -1 EPIPE 
(Broken pipe)
24846 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=24846, 
si_uid=1000} ---
24846 rt_sigreturn({mask=[]})           = -1 EPIPE (Broken pipe)
24846 write(2, "/build/diffutils-nPN2r4/diffutil"..., 48) = 48
24846 write(2, "standard output", 15)   = 15
24846 write(2, ": Broken pipe", 13)     = 13
24846 write(2, "\n", 1)                 = 1
24846 exit_group(2)                     = ?
24846 +++ exited with 2 +++
24845 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 2}], 0, NULL) = 
24846
24845 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24846, 
si_uid=1000, si_status=2, si_utime=0, si_stime=0} ---
24845 rt_sigreturn({mask=[]})           = 24846
24845 dup2(10, 1)                       = 1
24845 close(10)                         = 0
24845 exit_group(2)                     = ?
24845 +++ exited with 2 +++
--

We can see that diff reports "Broken pipe" with the standard output.
and here is the working case :

---
24987 write(1, "                                "..., 4096) = 4096
24987 write(1, "                                "..., 4096) = -1 EPIPE (Broken 
pipe)
24987 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=24987, 
si_uid=1000} ---
24987 rt_sigreturn({mask=[]})           = -1 EPIPE (Broken pipe)
24987 write(1, "\33[0m\n\\ No newline at end of file"..., 56) = -1 EPIPE 
(Broken pipe)
24987 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=24987, 
si_uid=1000} ---
24987 rt_sigreturn({mask=[]})           = -1 EPIPE (Broken pipe)
24987 rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE ALRM TERM TSTP XCPU XFSZ 
VTALRM PROF IO], [], 8) = 0
24987 rt_sigaction(SIGPIPE, {sa_handler=SIG_DFL, sa_mask=[PIPE], 
sa_flags=SA_RESTART}, {sa_handler=0x12ff80780, sa_mask=[HUP INT QUIT PIPE ALRM 
TERM TSTP XCPU XFSZ VTALRM PROF IO], sa_flags=SA_RESTART}, 8) = 0
24987 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [HUP INT QUIT PIPE ALRM TERM 
TSTP XCPU XFSZ VTALRM PROF IO], 8) = 0
24987 getpid()                          = 24987
24987 gettid()                          = 24987
24987 tgkill(24987, 24987, SIGPIPE)     = 0
24987 rt_sigprocmask(SIG_SETMASK, [HUP INT QUIT PIPE ALRM TERM TSTP XCPU XFSZ 
VTALRM PROF IO], NULL, 8) = 0
24987 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
24987 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=24987, 
si_uid=1000} ---
24987 +++ killed by SIGPIPE +++
24986 <... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGPIPE}], 0, 
NULL) = 24987
24986 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=24987, 
si_uid=1000, si_status=SIGPIPE, si_utime=0, si_stime=0} ---
24986 rt_sigreturn({mask=[]})           = 24987
24986 dup2(10, 1)                       = 1
24986 close(10)                         = 0
24986 exit_group(141)                   = ?
24986 +++ exited with 141 +++
---

So, when the test fails, we actually got some SIGPIPE but we returned 2.
We also see that diff already finished (it closes the a and b file)
(src/diff.c end of compare_files() )
Reading the code, I see that the signals are catched and processed later
by calling process_signals() periodically and that function may raise
the caught signal and we then exit with 141.
This way we can exit the process 141 in process_signals() or in other
functions such as pfatal_with_name() with 2 and "Broken pipe" message
(when a system call returns non-zero status and that is fatal).

In our case, the test expects diff to exit in process_signals() but not
in pfatal_with_name() which happens sometimes because we terminate the 
processing
*after* the last check of pending signals i.e. process_signals() has
occured, that is with exit code EXIT_TROUBLE and errno 13 and
pfatal_with_name() exits and displays the error message.

In both case, we trap the SIGPIPE problem, but it's reported differently as 
we exit in different functions. There is a window large enough in which
a signal can occur and won't be processed by process_signals.
Maybe the code should check before exiting that there are no pending
signals to reduce that window. Or it should process the signals in a
whole different manner.

I wrote a small patch but I'm not sure this is the good way to handle
the situation, but that works.

F.

>> The current version of diffutils (1:3.7-2) now seems to build on every
>> release arch, but I believe that more because of pure chance than
>> because the patch I added does anything useful at all.
>> 
>> I'm not going to touch the package until it reaches testing (ten days),
>> but after that I will welcome any help to fix this issue.
> 
> Fred, would you please help Santiago with this problem please?

F.

--- a/src/diff.h
+++ b/src/diff.h
@@ -390,6 +390,7 @@
 extern void begin_output (void);
 extern void debug_script (struct change *);
 extern void fatal (char const *) __attribute__((noreturn));
+extern void final_process_signals (void);
 extern void finish_output (void);
 extern void message (char const *, char const *, char const *);
 extern void message5 (char const *, char const *, char const *,
--- a/src/util.c
+++ b/src/util.c
@@ -1574,3 +1574,14 @@
 
   fflush (stderr);
 }
+
+/* Process remaining signals once before exit  */
+void
+final_process_signals (void)
+{
+  static int last = 1;
+  if (last) {
+    process_signals ();
+    last = 0;
+  }
+}
--- a/src/diff.c
+++ b/src/diff.c
@@ -1451,6 +1451,8 @@
        }
     }
 
+  final_process_signals ();
+
   /* Now the comparison has been done, if no error prevented it,
      and STATUS is the value this function will return.  */
 




----- End forwarded message -----

Reply via email to