[
https://issues.apache.org/jira/browse/COMPRESS-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390666#comment-17390666
]
Chanseok Oh commented on COMPRESS-583:
--------------------------------------
Thanks for the update, [~peterlee]. We updated our code to always scrub
UID/GID/user name/group name, which I think is the right thing to do on our
side, so this change in 1.21 won't block us. I wouldn't argue that this should
be reverted. Actually, I believe 1.20 was already picking up timestamp, so one
could think that UID, GID, etc. are just new additions to it.
I am not saying upgrading Compress should always generate exactly the same
output in different versions and different OSes. Of course, you can freely
introduce any kind of breaking changes (or "behavioral changes" if you don't
consider the changes as "breaking"). It's just that initially, I had a good
reason to believe that this could be an unintended regression (which mattered a
lot to us), as I didn't find any relevant changelog for this behavioral change.
BTW, reproducibility is very important in Docker (and thus to the broad Cloud
ecosystem), because a container image is really just a collection of tarballs
(+ some small metadata JSON files). Docker images are content-addressable; a
difference of one byte means a different SHA digest checksum (i.e., considered
a totally different image). Jib is a Docker image build tool, so that's why we
immediately noticed this change in Compress. Not even just for Docker,
reproducibility plays an important role for security and verification. So
basically, I just hope Compress doesn't totally neglect this reproducibility
aspect going forward.
> 1.21 generates different output binaries compared to older versions as well
> as on different OSes
> ------------------------------------------------------------------------------------------------
>
> Key: COMPRESS-583
> URL: https://issues.apache.org/jira/browse/COMPRESS-583
> Project: Commons Compress
> Issue Type: Bug
> Affects Versions: 1.21
> Reporter: Chanseok Oh
> Priority: Major
>
> Upgrading {{commons-compress}} had always been generating the same compressed
> output byte-to-byte for the same input (i.e., their SHA checksum didn't
> change between versions). However, starting with 1.21, we noticed it's
> generating different output than what previous versions are generating.
> We also noticed that the same code generates different binaries on different
> OSes. For example, 1.21 on Linux is different from 1.21 on Mac.
> However, at least on the same OS, 1.21 seems to reproducibly generate the
> same output.
> See the context at [https://github.com/GoogleContainerTools/jib/pull/3342]
> ----
> *UPDATE*: running diffoscope reveals that 1.21 is picking up the user and
> group of a local environment.
> (output below manually reformatted slightly for readability)
> {{$ diffoscope
> 6d2763b0f3940d324ea6b55386429e5b173899608abf7d1bff62e25dd2e4dcea.tar.gz
> 32258c626498c13412679442e3417811bc7ab801c6928da2c2a97e0bbc380a88.tar.gz}}
> {{---
> 6d2763b0f3940d324ea6b55386429e5b173899608abf7d1bff62e25dd2e4dcea.tar.gz}}
> {{+++
> 32258c626498c13412679442e3417811bc7ab801c6928da2c2a97e0bbc380a88.tar.gz}}
> {{│ --- 6d2763b0f3940d324ea6b55386429e5b173899608abf7d1bff62e25dd2e4dcea.tar}}
> {{├── +++
> 32258c626498c13412679442e3417811bc7ab801c6928da2c2a97e0bbc380a88.tar}}
> {{│ ├── file list}}
> {{│ │ @@ -1,3 +1,3 @@}}
> {{│ │ {color:#de350b}-drwxr-xr-x 0 0 0 0 1970-01-01
> 00:00:01.000000 app/{color}}}
> {{│ │ {color:#00875a}+drwxr-xr-x 0 chanseok (252384) eng (5000) 0 1970-01-01
> 00:00:01.000000 app/{color}}}
> {{│ │ -rw-r--r-- 0 0 0 0 1970-01-01 00:00:01.000000
> app/fileB.txt}}
> {{│ │ -rw-r--r-- 0 0 0 0 1970-01-01 00:00:01.000000
> app/fileC.txt}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)