Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE
On 2021-01-11 Étienne Mollier wrote: > Lasse Collin, on 2021-01-11 19:19:09 +0200: > > I understand from your message that you got a different result. I > > wonder what would explain the difference. Your results are close to > > what I would expect with the "trap '' PIPE" patch. Are you sure you > > used the correctly patched xzcmp for testing? Otherwise I'm > > clueless what could explain the difference in our results. > > Oops, I forgot to pop the initial patch off the stack. The > `trap '' PIPE` was still present on top of the script. >_<" > When I rerun these tests without this on top, I see the same > good results as you. This explains that. Good. :-) I noticed that in 2012 a similar issue was fixed in xzgrep. There SIGPIPE is detected more precisely with "kill -l $exit_status", so I adapted the xzdiff patch to use that correctly and committed it. The xzgrep method had a minor bug (it didn't check if the status was >= 128 indicating a signal). I also noticed another bug in xzgrep that it used gzip -q (which converts SIGPIPE to exit status 2) and ignored exit status 2 as SIGPIPE, which is bad because with bzip2 exit status 2 means corrupt input. These are now hopefully fixed too. Hopefully the removal of -q won't introduce a new bug in some special situation. One difference I noticed is that without -q bzip2 will display a fairly long error message if input is corrupt. With -q it is silent. I suspect the non-silent behavior is better with these scripts; just having exit status 2 from xzdiff/xzcmp isn't enough. With gzip and xz, -q or lack of -q makes no difference in this situation. With zstd the -q option is required though because otherwise zstd is noisy in normal situations. I also added zstd support to xzdiff/xzcmp. xzgrep already had it in xz.git. The scripts now have a different set of bugs than before. Hopefully the total number of bugs has not increased. :-) Thanks! -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE
Hello, Lasse Collin, on 2021-01-11 19:19:09 +0200: > Since xz-devel is subscribers only, I quote your message in full and > also include your test scripts as an attachment for others to see. Thanks for the notice, I subscribed for convenience. > I understand from your message that you got a different result. I wonder > what would explain the difference. Your results are close to what I > would expect with the "trap '' PIPE" patch. Are you sure you used the > correctly patched xzcmp for testing? Otherwise I'm clueless what could > explain the difference in our results. Oops, I forgot to pop the initial patch off the stack. The `trap '' PIPE` was still present on top of the script. >_<" When I rerun these tests without this on top, I see the same good results as you. This explains that. > Note that these lines don't do what one might think: > > > cat reproducer2.gz | xzcmp - reproducer1.gz > > cat reproducer2.bz2 | xzcmp - reproducer1.bz2 > > When reading from stdin, xzcmp/xzdiff assume that the input is either > in a format that xz understands (.xz or .lzma) or that it is > uncompressed. So in the above cases the compressed binary > reproducer2.{gz,bz2} is compared to uncompressed reproducer1 which > likely isn't what one intended to do. Good point, actually I had this in the test merely because it went out of the cartesian product of the possible combinations. I understand they might not be of very high relevancy. Have a nice day, :) -- Étienne Mollier Fingerprint: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da Sent from /dev/pts/2, please excuse my verbosity. signature.asc Description: PGP signature
Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE
Hello! Since xz-devel is subscribers only, I quote your message in full and also include your test scripts as an attachment for others to see. On 2021-01-09 Étienne Mollier wrote: > Lasse Collin, on 2021-01-09 17:38:20 +0200: > > The following patch replace -cdfq with -cdf and ignores decompressor > > exit statuses >= 128. Could someone test if it fixes the original > > bug without introducing any obvious new bugs with some of the > > supported compression tools? > > Many thanks for your work on this! I unraveled a stub of test > suite from my archives and made adaptations to try out gzip and > bzip2 in addition of xz. These tests are in attachment for your > convenience, but you may need to grab a random text file long > enough to trigger the SIGPIPE condition; the test fetches a > changelog file at a Debian specific location, which you can > otherwise download here[1]. > > [1] > https://tracker.debian.org/media/packages/x/xz-utils/changelog-5.2.5-1.0 > > I applied your patch and regenerated an xz-utils package I > installed on my system. I can see that the behavior using XZ > compressed files works as expected with regards or exit codes. > It looks good on that side to me. :) > > Comparing XZ with GZip compressed files, I see the discrepancy > in the few following cases, assuming reproducer2 is shorter than > reproducer1: > > xzcmp reproducer1.gz reproducer2.gz > xzcmp reproducer1.gz reproducer2.xz > xzcmp reproducer2.gz reproducer1.gz > xzcmp reproducer2.xz reproducer1.gz > cat reproducer2.gz | xzcmp - reproducer1.gz > cat reproducer2.xz | xzcmp - reproducer1.gz > > The pattern is very similar with BZip2 compression, as expected > I guess: > > xzcmp reproducer1.bz2 reproducer2.bz2 > xzcmp reproducer1.bz2 reproducer2.xz > xzcmp reproducer2.bz2 reproducer1.bz2 > xzcmp reproducer2.xz reproducer1.bz2 > cat reproducer2.bz2 | xzcmp - reproducer1.bz2 > cat reproducer2.xz | xzcmp - reproducer1.bz2 > > When XZ is not involved, but some foreign compression is at > play, I see a similar pattern again: > > xzcmp reproducer1.gz reproducer2.gz > xzcmp reproducer1.gz reproducer2 > xzcmp reproducer2.gz reproducer1.gz > xzcmp reproducer2reproducer1.gz > cat reproducer2.gz | xzcmp - reproducer1.gz > cat reproducer2| xzcmp - reproducer1.gz > > All other combinations of comparing XZ against GZip, or against > BZip2 work as expected. I'm afraid I have not tested all > combinations of non-compressed against GZip against BZip2, but I > believe the same pattern might appear similarly somehow. Thanks a lot! This is quite a thorough test already. I tried the scripts with the linked changelog and also with a bigger text file. With the patch included in my previous email the expected values match the observed values on all output lines from your test scripts, and so it would seem that the patch worked in all cases. I understand from your message that you got a different result. I wonder what would explain the difference. Your results are close to what I would expect with the "trap '' PIPE" patch. Are you sure you used the correctly patched xzcmp for testing? Otherwise I'm clueless what could explain the difference in our results. Note that these lines don't do what one might think: > cat reproducer2.gz | xzcmp - reproducer1.gz > cat reproducer2.bz2 | xzcmp - reproducer1.bz2 When reading from stdin, xzcmp/xzdiff assume that the input is either in a format that xz understands (.xz or .lzma) or that it is uncompressed. So in the above cases the compressed binary reproducer2.{gz,bz2} is compared to uncompressed reproducer1 which likely isn't what one intended to do. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode testwithbz2.sh Description: application/shellscript testwithgz.sh Description: application/shellscript testwithoutxz.sh Description: application/shellscript xzcmp_bug_844770.sh Description: application/shellscript
Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE
On 2021-01-08 Lasse Collin wrote: > It's tempting to ignore exit statuses >= 128 at the end of the script > where it current checks for "$xz_status" -eq 0 but that doesn't work > because in the middle of the script there is also this: > > case $xz_status in > *[1-9]*) xz_status=1;; > *) xz_status=0;; > esac > > There xz_status contains two numbers and the "case" finds out if they > are both zero. Perhaps this "case" should be replaced with something > more sophisticated that checks both numbers separately and ignores > values >= 128. I found it easy to change the above "case" and ignore >= 128 at the bottom of the script. But that still isn't enough. gzip -q turns SIGPIPE into exit status of 2. Ignoring 2 would work for gzip but that is incompatible with bzip2 which uses exit status 2 to indicate corrupt input. Trying to handle exit statuses separately for each compressor is complicated because two different tools may be in use at the same time. Getting rid of gzip -q sounds more reasonable. The script uses -cdfq to decompress and I'm not sure why the -q is there. There is an exception when only one argument is given, then -cd is used. The -f option is needed with xz (and it's fine with gzip too) to handle uncompressed files: xz -cdf or gzip -cdf with unknown file format behaves like cat and just copies the input to the output. The use of -f (and the non-use in the single-file case) looks fine, but I don't know what to think about -q. Perhaps it hides warnings in some rare cases where they aren't wanted. The following patch replace -cdfq with -cdf and ignores decompressor exit statuses >= 128. Could someone test if it fixes the original bug without introducing any obvious new bugs with some of the supported compression tools? diff --git a/src/scripts/xzdiff.in b/src/scripts/xzdiff.in index eb7825c..b285572 100644 --- a/src/scripts/xzdiff.in +++ b/src/scripts/xzdiff.in @@ -116,23 +116,17 @@ elif test $# -eq 2; then if test "$1$2" = --; then xz_status=$( exec 4>&1 - ($xz1 -cdfq - 4>&-; echo $? >&4) 3>&- | + ($xz1 -cdf - 4>&-; echo $? >&4) 3>&- | eval "$cmp" - - >&3 ) elif # Reject Solaris 8's buggy /bin/bash 2.03. echo X | (echo X | eval "$cmp" /dev/fd/5 - >/dev/null 2>&1) 5<&0; then xz_status=$( exec 4>&1 - ($xz1 -cdfq -- "$1" 4>&-; echo $? >&4) 3>&- | -( ($xz2 -cdfq -- "$2" 4>&-; echo $? >&4) 3>&- 5<&- &-; echo $? >&4) 3>&- | +( ($xz2 -cdf -- "$2" 4>&-; echo $? >&4) 3>&- 5<&- &3) 5<&0 ) -cmp_status=$? -case $xz_status in - *[1-9]*) xz_status=1;; - *) xz_status=0;; -esac -(exit $cmp_status) else F=`expr "/$2" : '.*/\(.*\)[-.][ablmotxz2]*$'` || F=$prog tmp= @@ -161,10 +155,10 @@ elif test $# -eq 2; then mkdir -- "${TMPDIR-/tmp}/$prog.$$" || exit 2 tmp="${TMPDIR-/tmp}/$prog.$$" fi -$xz2 -cdfq -- "$2" > "$tmp/$F" || exit 2 +$xz2 -cdf -- "$2" > "$tmp/$F" || exit 2 xz_status=$( exec 4>&1 - ($xz1 -cdfq -- "$1" 4>&-; echo $? >&4) 3>&- | + ($xz1 -cdf -- "$1" 4>&-; echo $? >&4) 3>&- | eval "$cmp" - '"$tmp/$F"' >&3 ) cmp_status=$? @@ -175,7 +169,7 @@ elif test $# -eq 2; then *) xz_status=$( exec 4>&1 - ($xz1 -cdfq -- "$1" 4>&-; echo $? >&4) 3>&- | + ($xz1 -cdf -- "$1" 4>&-; echo $? >&4) 3>&- | eval "$cmp" - '"$2"' >&3 );; esac;; @@ -184,7 +178,7 @@ elif test $# -eq 2; then *[-.][zZ] | *_z | *[-.][gx]z | *[-.]bz2 | *[-.]lzma | *.t[abglx]z | *.tbz2 | *[-.]lzo | *.tzo | -) xz_status=$( exec 4>&1 - ($xz2 -cdfq -- "$2" 4>&-; echo $? >&4) 3>&- | + ($xz2 -cdf -- "$2" 4>&-; echo $? >&4) 3>&- | eval "$cmp" '"$1"' - >&3 );; *) @@ -197,5 +191,9 @@ else fi cmp_status=$? -test "$xz_status" -eq 0 || exit 2 +for num in $xz_status ; do + test "$num" -eq 0 && continue + test "$num" -ge 128 && continue + exit 2 +done exit $cmp_status -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH] xzdiff: Trap SIGPIPE
On 2020-12-24 Sebastian Andrzej Siewior wrote: > The `cmp' command will return early if a difference is found while the > shell script is still invoking the decompressor which writes into the > closed FD. This results in SIGPIPE / exit code 141. > By ignoring SIGPIPE the real return code from `cmp' is observed which > is `1' and xzdiff exits with `1'. Without ignoring SIGPIPE the > exitcode 141 is observed and xzdiff returns with `2'. Thanks! Seems that the exit statuses of xzcmp and xzdiff have been broken maybe as long as they have been included in XZ Utils. The problem seems to be more hairy than it appears at first. Adding just one "trap" isn't a complete fix. xzdiff supports formats other than .xz too. gzip and lzop change their behavior if SIGPIPE is ignored. They print an error message about broken pipe and exit with status 1. That is, then it's not possible to distinguish between actual SIGPIPE and corrupt input. It's tempting to ignore exit statuses >= 128 at the end of the script where it current checks for "$xz_status" -eq 0 but that doesn't work because in the middle of the script there is also this: case $xz_status in *[1-9]*) xz_status=1;; *) xz_status=0;; esac There xz_status contains two numbers and the "case" finds out if they are both zero. Perhaps this "case" should be replaced with something more sophisticated that checks both numbers separately and ignores values >= 128. Obviously it's not problem free to ignore >= 128 since that could hide bad signals like SIGSEGV. At this point I feel it's not too bad but perhaps someone else has better ideas. As a bonus, the script already uses "trap" to delete temporary files. I think that code path is never used if the shell is fancy enough to do complex redirections. However, it traps also SIGPIPE and I think it can in such situations run rm -rf twice on the same path name. That feels suspicious but perhaps it's not too bad, especially since the code likely is never used on most systems. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode