Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE

2021-01-11 Thread Lasse Collin
On 2021-01-11 Étienne Mollier wrote:
> Lasse Collin, on 2021-01-11 19:19:09 +0200:
> > I understand from your message that you got a different result. I
> > wonder what would explain the difference. Your results are close to
> > what I would expect with the "trap '' PIPE" patch. Are you sure you
> > used the correctly patched xzcmp for testing? Otherwise I'm
> > clueless what could explain the difference in our results.  
> 
> Oops, I forgot to pop the initial patch off the stack.  The
> `trap '' PIPE` was still present on top of the script.  >_<"
> When I rerun these tests without this on top, I see the same
> good results as you.  This explains that.

Good. :-)

I noticed that in 2012 a similar issue was fixed in xzgrep. There
SIGPIPE is detected more precisely with "kill -l $exit_status", so I
adapted the xzdiff patch to use that correctly and committed it.

The xzgrep method had a minor bug (it didn't check if the status was >=
128 indicating a signal). I also noticed another bug in xzgrep that it
used gzip -q (which converts SIGPIPE to exit status 2) and ignored exit
status 2 as SIGPIPE, which is bad because with bzip2 exit status 2
means corrupt input. These are now hopefully fixed too.

Hopefully the removal of -q won't introduce a new bug in some special
situation. One difference I noticed is that without -q bzip2 will
display a fairly long error message if input is corrupt. With -q it is
silent. I suspect the non-silent behavior is better with these scripts;
just having exit status 2 from xzdiff/xzcmp isn't enough. With gzip and
xz, -q or lack of -q makes no difference in this situation. With zstd
the -q option is required though because otherwise zstd is noisy in
normal situations.

I also added zstd support to xzdiff/xzcmp. xzgrep already had it in
xz.git.

The scripts now have a different set of bugs than before. Hopefully the
total number of bugs has not increased. :-)

Thanks!

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode



Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE

2021-01-11 Thread Étienne Mollier
Hello,

Lasse Collin, on 2021-01-11 19:19:09 +0200:
> Since xz-devel is subscribers only, I quote your message in full and
> also include your test scripts as an attachment for others to see.

Thanks for the notice, I subscribed for convenience.

> I understand from your message that you got a different result. I wonder
> what would explain the difference. Your results are close to what I
> would expect with the "trap '' PIPE" patch. Are you sure you used the
> correctly patched xzcmp for testing? Otherwise I'm clueless what could
> explain the difference in our results.

Oops, I forgot to pop the initial patch off the stack.  The
`trap '' PIPE` was still present on top of the script.  >_<"
When I rerun these tests without this on top, I see the same
good results as you.  This explains that.

> Note that these lines don't do what one might think:
> 
> > cat reproducer2.gz | xzcmp - reproducer1.gz
> > cat reproducer2.bz2 | xzcmp - reproducer1.bz2
> 
> When reading from stdin, xzcmp/xzdiff assume that the input is either
> in a format that xz understands (.xz or .lzma) or that it is
> uncompressed. So in the above cases the compressed binary
> reproducer2.{gz,bz2} is compared to uncompressed reproducer1 which
> likely isn't what one intended to do.

Good point, actually I had this in the test merely because it
went out of the cartesian product of the possible combinations.
I understand they might not be of very high relevancy.

Have a nice day,  :)
-- 
Étienne Mollier 
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/2, please excuse my verbosity.


signature.asc
Description: PGP signature


Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE

2021-01-11 Thread Lasse Collin
Hello!

Since xz-devel is subscribers only, I quote your message in full and
also include your test scripts as an attachment for others to see.

On 2021-01-09 Étienne Mollier wrote:
> Lasse Collin, on 2021-01-09 17:38:20 +0200:
> > The following patch replace -cdfq with -cdf and ignores decompressor
> > exit statuses >= 128. Could someone test if it fixes the original
> > bug without introducing any obvious new bugs with some of the
> > supported compression tools?  
> 
> Many thanks for your work on this!  I unraveled a stub of test
> suite from my archives and made adaptations to try out gzip and
> bzip2 in addition of xz.  These tests are in attachment for your
> convenience, but you may need to grab a random text file long
> enough to trigger the SIGPIPE condition; the test fetches a
> changelog file at a Debian specific location, which you can
> otherwise download here[1].
> 
> [1]
> https://tracker.debian.org/media/packages/x/xz-utils/changelog-5.2.5-1.0
> 
> I applied your patch and regenerated an xz-utils package I
> installed on my system.  I can see that the behavior using XZ
> compressed files works as expected with regards or exit codes.
> It looks good on that side to me.  :)
> 
> Comparing XZ with GZip compressed files, I see the discrepancy
> in the few following cases, assuming reproducer2 is shorter than
> reproducer1:
> 
>   xzcmp reproducer1.gz reproducer2.gz
>   xzcmp reproducer1.gz reproducer2.xz
>   xzcmp reproducer2.gz reproducer1.gz
>   xzcmp reproducer2.xz reproducer1.gz
>   cat reproducer2.gz | xzcmp - reproducer1.gz
>   cat reproducer2.xz | xzcmp - reproducer1.gz
> 
> The pattern is very similar with BZip2 compression, as expected
> I guess:
> 
>   xzcmp reproducer1.bz2 reproducer2.bz2
>   xzcmp reproducer1.bz2 reproducer2.xz
>   xzcmp reproducer2.bz2 reproducer1.bz2
>   xzcmp reproducer2.xz  reproducer1.bz2
>   cat reproducer2.bz2 | xzcmp - reproducer1.bz2
>   cat reproducer2.xz  | xzcmp - reproducer1.bz2
> 
> When XZ is not involved, but some foreign compression is at
> play, I see a similar pattern again:
> 
>   xzcmp reproducer1.gz reproducer2.gz
>   xzcmp reproducer1.gz reproducer2
>   xzcmp reproducer2.gz reproducer1.gz
>   xzcmp reproducer2reproducer1.gz
>   cat reproducer2.gz | xzcmp - reproducer1.gz
>   cat reproducer2| xzcmp - reproducer1.gz
> 
> All other combinations of comparing XZ against GZip, or against
> BZip2 work as expected.  I'm afraid I have not tested all
> combinations of non-compressed against GZip against BZip2, but I
> believe the same pattern might appear similarly somehow.

Thanks a lot! This is quite a thorough test already.

I tried the scripts with the linked changelog and also with a bigger
text file. With the patch included in my previous email the expected
values match the observed values on all output lines from your test
scripts, and so it would seem that the patch worked in all cases.

I understand from your message that you got a different result. I wonder
what would explain the difference. Your results are close to what I
would expect with the "trap '' PIPE" patch. Are you sure you used the
correctly patched xzcmp for testing? Otherwise I'm clueless what could
explain the difference in our results.

Note that these lines don't do what one might think:

>   cat reproducer2.gz | xzcmp - reproducer1.gz
>   cat reproducer2.bz2 | xzcmp - reproducer1.bz2

When reading from stdin, xzcmp/xzdiff assume that the input is either
in a format that xz understands (.xz or .lzma) or that it is
uncompressed. So in the above cases the compressed binary
reproducer2.{gz,bz2} is compared to uncompressed reproducer1 which
likely isn't what one intended to do.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode


testwithbz2.sh
Description: application/shellscript


testwithgz.sh
Description: application/shellscript


testwithoutxz.sh
Description: application/shellscript


xzcmp_bug_844770.sh
Description: application/shellscript


Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE

2021-01-09 Thread Lasse Collin
On 2021-01-08 Lasse Collin wrote:
> It's tempting to ignore exit statuses >= 128 at the end of the script
> where it current checks for "$xz_status" -eq 0 but that doesn't work
> because in the middle of the script there is also this:
> 
> case $xz_status in
>   *[1-9]*) xz_status=1;;
>   *) xz_status=0;;
> esac
> 
> There xz_status contains two numbers and the "case" finds out if they
> are both zero. Perhaps this "case" should be replaced with something
> more sophisticated that checks both numbers separately and ignores
> values >= 128.

I found it easy to change the above "case" and ignore >= 128 at the
bottom of the script. But that still isn't enough.

gzip -q turns SIGPIPE into exit status of 2. Ignoring 2 would work for
gzip but that is incompatible with bzip2 which uses exit status 2 to
indicate corrupt input. Trying to handle exit statuses separately for
each compressor is complicated because two different tools may be in
use at the same time. Getting rid of gzip -q sounds more reasonable.

The script uses -cdfq to decompress and I'm not sure why the -q is
there. There is an exception when only one argument is given, then -cd
is used. The -f option is needed with xz (and it's fine with gzip too)
to handle uncompressed files: xz -cdf or gzip -cdf with unknown file
format behaves like cat and just copies the input to the output. The
use of -f (and the non-use in the single-file case) looks fine, but I
don't know what to think about -q. Perhaps it hides warnings in some
rare cases where they aren't wanted.

The following patch replace -cdfq with -cdf and ignores decompressor
exit statuses >= 128. Could someone test if it fixes the original bug
without introducing any obvious new bugs with some of the supported
compression tools?

diff --git a/src/scripts/xzdiff.in b/src/scripts/xzdiff.in
index eb7825c..b285572 100644
--- a/src/scripts/xzdiff.in
+++ b/src/scripts/xzdiff.in
@@ -116,23 +116,17 @@ elif test $# -eq 2; then
   if test "$1$2" = --; then
 xz_status=$(
   exec 4>&1
-  ($xz1 -cdfq - 4>&-; echo $? >&4) 3>&- |
+  ($xz1 -cdf - 4>&-; echo $? >&4) 3>&- |
 eval "$cmp" - - >&3
 )
   elif # Reject Solaris 8's buggy /bin/bash 2.03.
   echo X | (echo X | eval "$cmp" /dev/fd/5 - >/dev/null 2>&1) 
5<&0; then
 xz_status=$(
   exec 4>&1
-  ($xz1 -cdfq -- "$1" 4>&-; echo $? >&4) 3>&- |
-( ($xz2 -cdfq -- "$2" 4>&-; echo $? >&4) 3>&- 5<&- &-; echo $? >&4) 3>&- |
+( ($xz2 -cdf -- "$2" 4>&-; echo $? >&4) 3>&- 5<&- &3) 5<&0
 )
-cmp_status=$?
-case $xz_status in
-  *[1-9]*) xz_status=1;;
-  *) xz_status=0;;
-esac
-(exit $cmp_status)
   else
 F=`expr "/$2" : '.*/\(.*\)[-.][ablmotxz2]*$'` || F=$prog
 tmp=
@@ -161,10 +155,10 @@ elif test $# -eq 2; then
   mkdir -- "${TMPDIR-/tmp}/$prog.$$" || exit 2
   tmp="${TMPDIR-/tmp}/$prog.$$"
 fi
-$xz2 -cdfq -- "$2" > "$tmp/$F" || exit 2
+$xz2 -cdf -- "$2" > "$tmp/$F" || exit 2
 xz_status=$(
   exec 4>&1
-  ($xz1 -cdfq -- "$1" 4>&-; echo $? >&4) 3>&- |
+  ($xz1 -cdf -- "$1" 4>&-; echo $? >&4) 3>&- |
 eval "$cmp" - '"$tmp/$F"' >&3
 )
 cmp_status=$?
@@ -175,7 +169,7 @@ elif test $# -eq 2; then
   *)
 xz_status=$(
   exec 4>&1
-  ($xz1 -cdfq -- "$1" 4>&-; echo $? >&4) 3>&- |
+  ($xz1 -cdf -- "$1" 4>&-; echo $? >&4) 3>&- |
 eval "$cmp" - '"$2"' >&3
 );;
 esac;;
@@ -184,7 +178,7 @@ elif test $# -eq 2; then
   *[-.][zZ] | *_z | *[-.][gx]z | *[-.]bz2 | *[-.]lzma | *.t[abglx]z | 
*.tbz2 | *[-.]lzo | *.tzo | -)
 xz_status=$(
   exec 4>&1
-  ($xz2 -cdfq -- "$2" 4>&-; echo $? >&4) 3>&- |
+  ($xz2 -cdf -- "$2" 4>&-; echo $? >&4) 3>&- |
 eval "$cmp" '"$1"' - >&3
  );;
   *)
@@ -197,5 +191,9 @@ else
 fi
 
 cmp_status=$?
-test "$xz_status" -eq 0 || exit 2
+for num in $xz_status ; do
+  test "$num" -eq 0 && continue
+  test "$num" -ge 128 && continue
+  exit 2
+done
 exit $cmp_status

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode



Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE

2021-01-08 Thread Lasse Collin
On 2020-12-24 Sebastian Andrzej Siewior wrote:
> The `cmp' command will return early if a difference is found while the
> shell script is still invoking the decompressor which writes into the
> closed FD. This results in SIGPIPE / exit code 141.
> By ignoring SIGPIPE the real return code from `cmp' is observed which
> is `1' and xzdiff exits with `1'. Without ignoring SIGPIPE the
> exitcode 141 is observed and xzdiff returns with `2'.

Thanks! Seems that the exit statuses of xzcmp and xzdiff have been
broken maybe as long as they have been included in XZ Utils.

The problem seems to be more hairy than it appears at first. Adding
just one "trap" isn't a complete fix.

xzdiff supports formats other than .xz too. gzip and lzop change their
behavior if SIGPIPE is ignored. They print an error message about
broken pipe and exit with status 1. That is, then it's not possible to
distinguish between actual SIGPIPE and corrupt input.

It's tempting to ignore exit statuses >= 128 at the end of the script
where it current checks for "$xz_status" -eq 0 but that doesn't work
because in the middle of the script there is also this:

case $xz_status in
  *[1-9]*) xz_status=1;;
  *) xz_status=0;;
esac

There xz_status contains two numbers and the "case" finds out if they
are both zero. Perhaps this "case" should be replaced with something
more sophisticated that checks both numbers separately and ignores
values >= 128.

Obviously it's not problem free to ignore >= 128 since that could hide
bad signals like SIGSEGV. At this point I feel it's not too bad but
perhaps someone else has better ideas.

As a bonus, the script already uses "trap" to delete temporary files. I
think that code path is never used if the shell is fancy enough to do
complex redirections. However, it traps also SIGPIPE and I think it can
in such situations run rm -rf twice on the same path name. That feels
suspicious but perhaps it's not too bad, especially since the code
likely is never used on most systems.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode