Re: Binary blobs in source trees

2024-04-02 Thread Gary Gregory
Not really. How would you generate a corrupted zip file? Or a file that was
generated by a fuzzer?

Gary

On Tue, Apr 2, 2024, 3:57 PM Nick Wellnhofer  wrote:

> Binary test data can also be generated with a script or a more
> sophisticated test suite which might even be more maintainable in the long
> run.
>
> On the other hand, tests are the prime target to hide malicious code and
> there are many ways to hide data even in innocuous-looking text files. I'd
> still argue that binary files should be avoided in open-source repositories
> just for the sake of maintainability. If you really need a larger set of
> binary test files, you can also move them to another repo.
>
> Nick
>
>
> > On Apr 2, 2024, at 21:28, Gary Gregory  wrote:
> >
> > Binary files are appropriate in a repository, for example, Apache
> > Commons Compress contains various normal and broken compressed files
> > in its test fixtures.
> >
> > Gary
>
>
> -
> To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
> For additional commands, e-mail:
> security-discuss-h...@community.apache.org
>
>


XZ, covert actions, Industry limits - drugs-smuggling

2024-04-02 Thread Dirk-Willem van Gulik
May be useful for us to somehow ‘scope’ this XZ issue somehow in things that 
are ‘in’ our domain - and things that are (well)outside the software industry 
domain.

E.g. - a typical state actor that is able to fund/control a person for several 
years surreptitiously is in itself hard to avoid - in the commercial world, in 
the open source world or even deep within the intelligence community. These 
things happen. 
And you sort of accept that - and focus on things such as 4-eyes and so on in 
the normal world. While you let the spook’s do their thing in their field of 
expertise. 

But you generally do not expect industry to tackle this head on. I.e. we expect 
the shipping industry around Rotterdam harbour to make smuggling drugs fairly 
hard & expensive. But there is a certain range of activities that we expect the 
police and the governments to do. But you do not expect a software company to 
(or open source) whose primary process it is to build the right software - to 
become deep experts at such a tertiary field.

On the other hand - if instead of a state actor inducing/coercing/participating 
one would look at, say, a volunteer that is being ’stupid’ - then it is in our 
realm. (Stupid here can be like those professors a few years ago that here 
experimenting with introducing vulnerabilities to see how fast they would be 
spotted — but may also be humans that feel a need for some protection, things 
they can barter with (e.g. with their local powers) or things they can cash in 
in times of need (e.g. in a bug bounty programme for hard US dollars). And then 
it is much more in our realm to see if we can solve it.

But somehow I think it would be good to determine where we can go and where 
not. I.e. the analogy of drugs smuggling and the role of, say, a transport 
company or a truck driver company.

With kind regards,

Dw







-
To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
For additional commands, e-mail: security-discuss-h...@community.apache.org



Re: Binary blobs in source trees

2024-04-02 Thread Dirk-Willem van Gulik
On 2 Apr 2024, at 22:01, Dominik Psenner  wrote:

> On Tue, 2 Apr 2024, 21:57 Nick Wellnhofer,  wrote:
> 
>>> On Apr 2, 2024, at 21:28, Gary Gregory  wrote:
>>> 
>>> Binary files are appropriate in a repository, for example, Apache
>>> Commons Compress contains various normal and broken compressed files
>>> in its test fixtures.
>> 
>> Binary test data can also be generated with a script or a more
>> sophisticated test suite which might even be more maintainable in the long
>> run.
...
> Binary files are fine to me if provenance and purpose is documented and
> auditable. The same applies to code.
> 
> It is troublesome if nobody checks neither provenance nore purpose. But
> that equally applies for code. Code can contain hidden malicious algorithms.
> 
> That said, code that generates binary content can be equally dangerous.
> 
> Readability is gold. I love code that is readable like a story or a book.
> That makes code auditable and maintainable in the long run.

Though it is a long time ago -- when I worked on software related to nuclear 
safeguards - one of the rules was that (binary/opaque) testdata was to be 
generated or constructed by a reproducible, documented process or script that a 
human could peer review and understand. And that such data itself was not 
allowed to be persisted (e.g. in a distribution package or CVS (the git of that 
era)).

That actually worked rather well.

With kind regards,

Dw


-
To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
For additional commands, e-mail: security-discuss-h...@community.apache.org



Re: Binary blobs in source trees

2024-04-02 Thread Dominik Psenner
Binary files are fine to me if provenance and purpose is documented and
auditable. The same applies to code.

It is troublesome if nobody checks neither provenance nore purpose. But
that equally applies for code. Code can contain hidden malicious algorithms.

That said, code that generates binary content can be equally dangerous.

Readability is gold. I love code that is readable like a story or a book.
That makes code auditable and maintainable in the long run.

On Tue, 2 Apr 2024, 21:57 Nick Wellnhofer,  wrote:

> Binary test data can also be generated with a script or a more
> sophisticated test suite which might even be more maintainable in the long
> run.
>
> On the other hand, tests are the prime target to hide malicious code and
> there are many ways to hide data even in innocuous-looking text files. I'd
> still argue that binary files should be avoided in open-source repositories
> just for the sake of maintainability. If you really need a larger set of
> binary test files, you can also move them to another repo.
>
> Nick
>
>
> > On Apr 2, 2024, at 21:28, Gary Gregory  wrote:
> >
> > Binary files are appropriate in a repository, for example, Apache
> > Commons Compress contains various normal and broken compressed files
> > in its test fixtures.
> >
> > Gary
>
>
> -
> To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
> For additional commands, e-mail:
> security-discuss-h...@community.apache.org
>
>


Re: Binary blobs in source trees

2024-04-02 Thread Nick Wellnhofer
Binary test data can also be generated with a script or a more sophisticated 
test suite which might even be more maintainable in the long run.

On the other hand, tests are the prime target to hide malicious code and there 
are many ways to hide data even in innocuous-looking text files. I'd still 
argue that binary files should be avoided in open-source repositories just for 
the sake of maintainability. If you really need a larger set of binary test 
files, you can also move them to another repo.

Nick


> On Apr 2, 2024, at 21:28, Gary Gregory  wrote:
> 
> Binary files are appropriate in a repository, for example, Apache
> Commons Compress contains various normal and broken compressed files
> in its test fixtures.
> 
> Gary


-
To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
For additional commands, e-mail: security-discuss-h...@community.apache.org



Re: Binary blobs in source trees

2024-04-02 Thread Gary Gregory
Binary files are appropriate in a repository, for example, Apache
Commons Compress contains various normal and broken compressed files
in its test fixtures.

Gary

On Tue, Apr 2, 2024 at 3:04 PM Mike Drob  wrote:
>
> Security,
>
> One of the interesting things coming out of the xz backdoor investigation is 
> the apparent use of binary data in "test files" to precipitate the backdoor. 
> I know that we have a "no compiled code" policy for our releases, but I have 
> also seen in practice that projects let binary junk in to test folders 
> (myself having checked in binaries to tests at least once). Is there an 
> opportunity here to shore up the repo contents?
>
> Can we do this in a way that doesn't involve inspecting each file manually 
> and then concluding that it needs to stay because we're testing backwards 
> compat for data produced by an older version of the code and we can't 
> actually generate anymore, so there's no action that we can take?
>
> This is a parallel discussion to Jarek's thread on this list regarding 
> provenance checks. Instead of at release time, maybe we shift some of the 
> checking or building to earlier in the development cycle?
>
> Mike
>
> -
> To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
> For additional commands, e-mail: security-discuss-h...@community.apache.org
>

-
To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
For additional commands, e-mail: security-discuss-h...@community.apache.org



Binary blobs in source trees

2024-04-02 Thread Mike Drob
Security,

One of the interesting things coming out of the xz backdoor investigation is 
the apparent use of binary data in "test files" to precipitate the backdoor. I 
know that we have a "no compiled code" policy for our releases, but I have also 
seen in practice that projects let binary junk in to test folders (myself 
having checked in binaries to tests at least once). Is there an opportunity 
here to shore up the repo contents?

Can we do this in a way that doesn't involve inspecting each file manually and 
then concluding that it needs to stay because we're testing backwards compat 
for data produced by an older version of the code and we can't actually 
generate anymore, so there's no action that we can take?

This is a parallel discussion to Jarek's thread on this list regarding 
provenance checks. Instead of at release time, maybe we shift some of the 
checking or building to earlier in the development cycle?

Mike

-
To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
For additional commands, e-mail: security-discuss-h...@community.apache.org



Re: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread Jarek Potiuk
Following up what Mark/ Piotr mentioned and echoing what Sebb wrote:

> Yes, there may be some exceptions where files are needed in the tarball
that are not in the repo.
> However these must be directly derivable from the source repo by
anyone with the appropriate tools.

This is also what I like about the "reproducibility" movement.  And not
even bit-to-bit reproducibly. If we are able to have a replicable release
environment and have Release Manager + min 2 other PMC members to actually
build the package from sources + tools - then we do not have to track and
see if some files had been derived from some other files properly.

We could simply compare the three packages generated by 3 different people
using 3 different machines - each of them maintained separately. With
bit-to-bit reproducibility it's easier, but even without it - comparing the
results and reasonable explanation on where the differences come from
(timestamps and the like) is "enough" to announce  "Yep, provenance of
these packages has been confirmed as coming from this particular TAG /
commit Hash in the Git repo - because 3 people built it".

Is it wasteful ? Probably - depending how difficult it is to build those
packages and how complicated it is to set-up a build environment and
reproduce it by another person. And those people might not have appropriate
tools and environment, it might take time/effort to have it setup of
course. And the release process might not be "rigid" enough to give similar
results when 3 people follow the "source package build" procedure.

But is it useful? IMHO - immensely. Eventually having the process well
described, reproducible by another person who just follows the instructions
and does not have other undocumented prerequisites is precisely what makes
the release process good, and where "bus factor" is > 1.

I would think about this kind of policy: "PMC member when voting +1 is
REQUIRED to verify the provenance of the package to make sure it comes from
the sources in version control system In whatever way the PMC feels
appropriate: comparing sources, or reproducible builds, or just preparing
the same package and comparing if any differences can be explained".  I
would never ever want to mandate HOW to do it, just make sure that
"provenance" is something that is needed for +1 (by default).

That **might** be achievable by virtually all our projects, and might also
eventually drive them to improve and automate more parts of it - eventually
making it possible for the new platform that infra develops - because
basically having a reproducible, fully automated process of building the
package will be prerequisite for using such platform. And maybe some
exceptions might be documented and explained why it can't be achieved with
a goal to achieve it. Even if there will be few projects that won't be able
to achieve it, I'd rather add this expectation by default, and allow for
exceptions.

And then (long term) - we could even do "get the package prepared by the
ASF distribution platform, and have 3 PMC members build it locally to
compare what they built with the one generated by the distribution
platform" - which could allow some of our builds to be
build/signed/published mostly in the ASF, while PMC members just "Verify"
that the packages have the right provenance.

J.

On Tue, Apr 2, 2024 at 1:16 PM sebb  wrote:

> On Tue, 2 Apr 2024 at 09:51, Mark Thomas  wrote:
> >
> > On 02/04/2024 09:12, sebb wrote:
> > > On Tue, 2 Apr 2024 at 08:47, Christofer Dutz <
> christofer.d...@c-ware.de> wrote:
> > >>
> > >> Hi all,
> > >>
> > >>
> > >>
> > >> I fully agree on this … and adding to sebb’s statement that
> additional files can happen even without malicious intent.
> > >>
> > >> I have seen this several times, if for example maven releases are
> built directly from the main checkout and not from the release-plugin
> checking out the release commit hash and building in a clean directory.
> > >>
> > >
> > > A clean checkout helps, but is no guarantee.
> > >
> > > Spurious files can end up in a source tarball even when it is created
> > > from a clean checkout.
> > >
> > > I saw this in a Maven build where faulty test code left behind some
> > > test artifacts.
> > > Since Maven creates the source archive after the test phase, such
> > > files can end up being included.
> >
> > Nice idea in principle, but it is going to create issues for C based
> > projects in practice.
> >
> > End users expect to be able to build C based projects with configure,
> > make, make install. That only works because the release manager runs a
> > script (typically called buildconf that uses autoconf) to create the
> > support scripts required in the src tarball.
>
> Yes, there may be some exceptions where files are needed in the
> tarball that are not in the repo.
> However these must be directly derivable from the source repo by
> anyone with the appropriate tools.
>
> > There are various possible solutions but I strongly suggest engagement
> > 

Re: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread sebb
On Tue, 2 Apr 2024 at 09:51, Mark Thomas  wrote:
>
> On 02/04/2024 09:12, sebb wrote:
> > On Tue, 2 Apr 2024 at 08:47, Christofer Dutz  
> > wrote:
> >>
> >> Hi all,
> >>
> >>
> >>
> >> I fully agree on this … and adding to sebb’s statement that additional 
> >> files can happen even without malicious intent.
> >>
> >> I have seen this several times, if for example maven releases are built 
> >> directly from the main checkout and not from the release-plugin checking 
> >> out the release commit hash and building in a clean directory.
> >>
> >
> > A clean checkout helps, but is no guarantee.
> >
> > Spurious files can end up in a source tarball even when it is created
> > from a clean checkout.
> >
> > I saw this in a Maven build where faulty test code left behind some
> > test artifacts.
> > Since Maven creates the source archive after the test phase, such
> > files can end up being included.
>
> Nice idea in principle, but it is going to create issues for C based
> projects in practice.
>
> End users expect to be able to build C based projects with configure,
> make, make install. That only works because the release manager runs a
> script (typically called buildconf that uses autoconf) to create the
> support scripts required in the src tarball.

Yes, there may be some exceptions where files are needed in the
tarball that are not in the repo.
However these must be directly derivable from the source repo by
anyone with the appropriate tools.

> There are various possible solutions but I strongly suggest engagement
> with projects such as httpd before trying to change the current policy.
>
> Mark
>
> -
> To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
> For additional commands, e-mail: security-discuss-h...@community.apache.org
>

-
To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
For additional commands, e-mail: security-discuss-h...@community.apache.org



Re: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread Piotr P. Karwasz
Hi Jarek,

On Tue, 2 Apr 2024 at 08:52, Jarek Potiuk  wrote:
> From earlier discussions - many of us think that verifying whether the
> sources in the "source" package contain the same sources as ones stored in
> our source repositories is the most important part of such verification,
> but - somewhat to my surprise - it has not been explicitly stated in our
> policies. And I think it should be an important gate to have PMC members to
> be REQUIRED to verify that. That could be done in whatever way is
> appropriate for the project - it could be just comparing sources with git,
> or having reproducible packages that PMC members can build and compare for
> binary identity if the project supports it.

Verifying the source archive is not such a trivial task, simply
because the contents of the Git repo and source archive **do** differ.

Most Apache projects that use Maven, use assembly descriptors from the
15 years old `apache-jar-resource-bundle`[1]. These descriptors both:
* remove CI-specific resources that are not useful in the source bundle[1],
* add resources generated at build time[2].

Therefore our source bundles are open to the same attack vector as `liblzma`.

In the Logging Services project we chose to:

1. Create our source archives as **exact** copy of the Git repo,
2. We verify in an indirect way that the source archive is correct:
the published artifacts are generated from the Git repo, while we
verify their reproducibility using the source archive.

Piotr

[1] https://github.com/apache/maven-apache-resources
[2] 
https://github.com/apache/maven-apache-resources/blob/f609acfd574277be6382ce381f65cab2db895d8d/source-release/src/main/resources/assemblies/source-shared.xml#L58-L63
[3] 
https://github.com/apache/maven-apache-resources/blob/f609acfd574277be6382ce381f65cab2db895d8d/source-release/src/main/resources/assemblies/source-shared.xml#L76-L80

-
To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
For additional commands, e-mail: security-discuss-h...@community.apache.org



Re: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread Mark Thomas

On 02/04/2024 09:12, sebb wrote:

On Tue, 2 Apr 2024 at 08:47, Christofer Dutz  wrote:


Hi all,



I fully agree on this … and adding to sebb’s statement that additional files 
can happen even without malicious intent.

I have seen this several times, if for example maven releases are built 
directly from the main checkout and not from the release-plugin checking out 
the release commit hash and building in a clean directory.



A clean checkout helps, but is no guarantee.

Spurious files can end up in a source tarball even when it is created
from a clean checkout.

I saw this in a Maven build where faulty test code left behind some
test artifacts.
Since Maven creates the source archive after the test phase, such
files can end up being included.


Nice idea in principle, but it is going to create issues for C based 
projects in practice.


End users expect to be able to build C based projects with configure, 
make, make install. That only works because the release manager runs a 
script (typically called buildconf that uses autoconf) to create the 
support scripts required in the src tarball.


There are various possible solutions but I strongly suggest engagement 
with projects such as httpd before trying to change the current policy.


Mark

-
To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
For additional commands, e-mail: security-discuss-h...@community.apache.org



Re: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread Philippe Ombredanne
Hi Jarek:

On Tue, Apr 2, 2024 at 8:53 AM Jarek Potiuk  wrote:
[...]
> TL;DR; I think that we currently do not explicitly state the requirement of
> verifying if the release manager has not tampered with the sources when
> preparing the source package - and I believe we should be more explicit
> about it and require from PMC members to do such verification.

FWIW, back2source [1] may help in the near future. This project is a
work in progress with the goal to validate that the code in a VCS tag,
the source archives and the binary archives of a release all match
correctly. It lives in ScanCode.io [2]

[1] https://nlnet.nl/project/Back2source/
[2] https://github.com/nexB/scancode.io
-- 
Cordially
Philippe Ombredanne

-
To unsubscribe, e-mail: security-discuss-unsubscr...@community.apache.org
For additional commands, e-mail: security-discuss-h...@community.apache.org



Re: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread sebb
On Tue, 2 Apr 2024 at 08:47, Christofer Dutz  wrote:
>
> Hi all,
>
>
>
> I fully agree on this … and adding to sebb’s statement that additional files 
> can happen even without malicious intent.
>
> I have seen this several times, if for example maven releases are built 
> directly from the main checkout and not from the release-plugin checking out 
> the release commit hash and building in a clean directory.
>

A clean checkout helps, but is no guarantee.

Spurious files can end up in a source tarball even when it is created
from a clean checkout.

I saw this in a Maven build where faulty test code left behind some
test artifacts.
Since Maven creates the source archive after the test phase, such
files can end up being included.

>
> But indeed, this has also been something worrying me, and I think it could 
> possibly become important with all the CRA and PLD stuff coming our way.
>
>
>
> Chris
>
>
>
>
>
> Von: sebb 
> Datum: Dienstag, 2. April 2024 um 09:14
> An: Jarek Potiuk 
> Cc: security-discuss@community.apache.org 
> , Users 
> Betreff: Re: [DISCUSS] Should we update our policies to include source 
> provenance check
>
> WARNING: this post mixes public and private lists.
>
> In Commons reviewers are supposed to check that the source tarball
> contents all match files from the tag in the vote.
> The reason is mainly for provenance, and licensing, but it is not
> unknown for spurious files to be accidentally added to the source
> tarball.
> Things can go wrong even without malicious intent.
>
> So I agree that this is a vital part of the process, and should be
> made explicit.
>
> On Tue, 2 Apr 2024 at 07:52, Jarek Potiuk  wrote:
> >
> > Following some of the learnings from the CVE-2024-3094 (xz backdoor) and a 
> > few resulting discussions. I would like to start a discussion on that very 
> > specific topic:
> >
> > TL;DR; I think that we currently do not explicitly state the requirement of 
> > verifying if the release manager has not tampered with the sources when 
> > preparing the source package - and I believe we should be more explicit 
> > about it and require from PMC members to do such verification.
> >
> > As explained in [1] - there were two important triggers for the CVE to 
> > happen:
> >
> > a) the attacker was able to gain trust, become a maintainer and release 
> > manager
> > b) they submitted test binaries to the repository of xv that contained 
> > malicious code
> > c) acting as release manager - they modified the official source tar-ball 
> > packages of xz to contain a modified Makefile that turn anyone using 
> > official source-tar-ball packages to produce a malicious version of the xz 
> > library (that malicious Makefile had never been part of the source 
> > repository, it's not been reviewed nor approved by anyone).
> >
> > When I look at requirements explained in our release policy, I think this  
> > kind of scenario (especially point c) is not something our release policies 
> > protect us against, because we have no requirement to check provenance of 
> > the source code in the released package.
> >
> > Or at least I cannot find it in neither release policy [2] nor distribution 
> > policy [3].
> >
> > From [2]:
> >
> > > Before casting +1 binding votes, individuals are REQUIRED to download all 
> > > signed source code packages onto their own hardware, verify that they 
> > > meet all requirements of ASF policy on releases as described below, 
> > > validate all cryptographic signatures, compile as provided, and test the 
> > > result on their own platform.
> >
> > Even if we assume such a check is part of "meet all requirements of ASF 
> > policy on releases" - there is no "check if the sources in the package have 
> > not been modified vs. source repository" anywhere in the policies as far as 
> > I can see.
> >
> > From earlier discussions - many of us think that verifying whether the 
> > sources in the "source" package contain the same sources as ones stored in 
> > our source repositories is the most important part of such verification, 
> > but - somewhat to my surprise - it has not been explicitly stated in our 
> > policies. And I think it should be an important gate to have PMC members to 
> > be REQUIRED to verify that. That could be done in whatever way is 
> > appropriate for the project - it could be just comparing sources with git, 
> > or having reproducible packages that PMC members can build and compare for 
> > binary identity if the project supports it.
> >
> > But similarly to comparing cryptographic signatures, possibly we should 
> > explicitly state that this should be a mandatory check. And maybe we have 
> > the chance to use the CVE-2024-3094 as an opportunity to remind/advocate it 
> > and explain what could happen if this step is missing when our source 
> > packages are released?
> >
> > I think the way it's stated, a malicious release manager could do a similar 
> > package modification as xz release manager did and we could have 

AW: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread Christofer Dutz
Hi all,

I fully agree on this … and adding to sebb’s statement that additional files 
can happen even without malicious intent.
I have seen this several times, if for example maven releases are built 
directly from the main checkout and not from the release-plugin checking out 
the release commit hash and building in a clean directory.

But indeed, this has also been something worrying me, and I think it could 
possibly become important with all the CRA and PLD stuff coming our way.

Chris


Von: sebb 
Datum: Dienstag, 2. April 2024 um 09:14
An: Jarek Potiuk 
Cc: security-discuss@community.apache.org 
, Users 
Betreff: Re: [DISCUSS] Should we update our policies to include source 
provenance check
WARNING: this post mixes public and private lists.

In Commons reviewers are supposed to check that the source tarball
contents all match files from the tag in the vote.
The reason is mainly for provenance, and licensing, but it is not
unknown for spurious files to be accidentally added to the source
tarball.
Things can go wrong even without malicious intent.

So I agree that this is a vital part of the process, and should be
made explicit.

On Tue, 2 Apr 2024 at 07:52, Jarek Potiuk  wrote:
>
> Following some of the learnings from the CVE-2024-3094 (xz backdoor) and a 
> few resulting discussions. I would like to start a discussion on that very 
> specific topic:
>
> TL;DR; I think that we currently do not explicitly state the requirement of 
> verifying if the release manager has not tampered with the sources when 
> preparing the source package - and I believe we should be more explicit about 
> it and require from PMC members to do such verification.
>
> As explained in [1] - there were two important triggers for the CVE to happen:
>
> a) the attacker was able to gain trust, become a maintainer and release 
> manager
> b) they submitted test binaries to the repository of xv that contained 
> malicious code
> c) acting as release manager - they modified the official source tar-ball 
> packages of xz to contain a modified Makefile that turn anyone using official 
> source-tar-ball packages to produce a malicious version of the xz library 
> (that malicious Makefile had never been part of the source repository, it's 
> not been reviewed nor approved by anyone).
>
> When I look at requirements explained in our release policy, I think this  
> kind of scenario (especially point c) is not something our release policies 
> protect us against, because we have no requirement to check provenance of the 
> source code in the released package.
>
> Or at least I cannot find it in neither release policy [2] nor distribution 
> policy [3].
>
> From [2]:
>
> > Before casting +1 binding votes, individuals are REQUIRED to download all 
> > signed source code packages onto their own hardware, verify that they meet 
> > all requirements of ASF policy on releases as described below, validate all 
> > cryptographic signatures, compile as provided, and test the result on their 
> > own platform.
>
> Even if we assume such a check is part of "meet all requirements of ASF 
> policy on releases" - there is no "check if the sources in the package have 
> not been modified vs. source repository" anywhere in the policies as far as I 
> can see.
>
> From earlier discussions - many of us think that verifying whether the 
> sources in the "source" package contain the same sources as ones stored in 
> our source repositories is the most important part of such verification, but 
> - somewhat to my surprise - it has not been explicitly stated in our 
> policies. And I think it should be an important gate to have PMC members to 
> be REQUIRED to verify that. That could be done in whatever way is appropriate 
> for the project - it could be just comparing sources with git, or having 
> reproducible packages that PMC members can build and compare for binary 
> identity if the project supports it.
>
> But similarly to comparing cryptographic signatures, possibly we should 
> explicitly state that this should be a mandatory check. And maybe we have the 
> chance to use the CVE-2024-3094 as an opportunity to remind/advocate it and 
> explain what could happen if this step is missing when our source packages 
> are released?
>
> I think the way it's stated, a malicious release manager could do a similar 
> package modification as xz release manager did and we could have missed it. 
> Our policies on release do not have explicit gates protecting against this - 
> so PMC members explicitly give +1 - following the release policy pretty 
> rigorously, could have not realise the malicious release manager did such
>
> Just to give an example from the past Airflow releases - when I came to the 
> project, I've learned how it works, and the release process was very 
> rigorously followed, including licences, signatures, etc. and we even pulled 
> a few releases when those were not met. But it's only a few years later when 
> I became a release manager and 

Re: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread Jarek Potiuk
> WARNING: this post mixes public and private lists.

Yes. Apologies. I keep on forgetting that builds@ is public and users@ is
private. My intention is to make it a public discussion - there is nothing
"private" in this discussion - all the docs/references are public.  I moved
"users@" to bcc: - and I guess if someone would like to comment on that can
come here to security-discuss@

J.


On Tue, Apr 2, 2024 at 9:12 AM sebb  wrote:

> WARNING: this post mixes public and private lists.
>
> In Commons reviewers are supposed to check that the source tarball
> contents all match files from the tag in the vote.
> The reason is mainly for provenance, and licensing, but it is not
> unknown for spurious files to be accidentally added to the source
> tarball.
> Things can go wrong even without malicious intent.
>
> So I agree that this is a vital part of the process, and should be
> made explicit.
>
> On Tue, 2 Apr 2024 at 07:52, Jarek Potiuk  wrote:
> >
> > Following some of the learnings from the CVE-2024-3094 (xz backdoor) and
> a few resulting discussions. I would like to start a discussion on that
> very specific topic:
> >
> > TL;DR; I think that we currently do not explicitly state the requirement
> of verifying if the release manager has not tampered with the sources when
> preparing the source package - and I believe we should be more explicit
> about it and require from PMC members to do such verification.
> >
> > As explained in [1] - there were two important triggers for the CVE to
> happen:
> >
> > a) the attacker was able to gain trust, become a maintainer and release
> manager
> > b) they submitted test binaries to the repository of xv that contained
> malicious code
> > c) acting as release manager - they modified the official source
> tar-ball packages of xz to contain a modified Makefile that turn anyone
> using official source-tar-ball packages to produce a malicious version of
> the xz library (that malicious Makefile had never been part of the source
> repository, it's not been reviewed nor approved by anyone).
> >
> > When I look at requirements explained in our release policy, I think
> this  kind of scenario (especially point c) is not something our release
> policies protect us against, because we have no requirement to check
> provenance of the source code in the released package.
> >
> > Or at least I cannot find it in neither release policy [2] nor
> distribution policy [3].
> >
> > From [2]:
> >
> > > Before casting +1 binding votes, individuals are REQUIRED to download
> all signed source code packages onto their own hardware, verify that they
> meet all requirements of ASF policy on releases as described below,
> validate all cryptographic signatures, compile as provided, and test the
> result on their own platform.
> >
> > Even if we assume such a check is part of "meet all requirements of ASF
> policy on releases" - there is no "check if the sources in the package have
> not been modified vs. source repository" anywhere in the policies as far as
> I can see.
> >
> > From earlier discussions - many of us think that verifying whether the
> sources in the "source" package contain the same sources as ones stored in
> our source repositories is the most important part of such verification,
> but - somewhat to my surprise - it has not been explicitly stated in our
> policies. And I think it should be an important gate to have PMC members to
> be REQUIRED to verify that. That could be done in whatever way is
> appropriate for the project - it could be just comparing sources with git,
> or having reproducible packages that PMC members can build and compare for
> binary identity if the project supports it.
> >
> > But similarly to comparing cryptographic signatures, possibly we should
> explicitly state that this should be a mandatory check. And maybe we have
> the chance to use the CVE-2024-3094 as an opportunity to remind/advocate it
> and explain what could happen if this step is missing when our source
> packages are released?
> >
> > I think the way it's stated, a malicious release manager could do a
> similar package modification as xz release manager did and we could have
> missed it. Our policies on release do not have explicit gates protecting
> against this - so PMC members explicitly give +1 - following the release
> policy pretty rigorously, could have not realise the malicious release
> manager did such
> >
> > Just to give an example from the past Airflow releases - when I came to
> the project, I've learned how it works, and the release process was very
> rigorously followed, including licences, signatures, etc. and we even
> pulled a few releases when those were not met. But it's only a few years
> later when I became a release manager and realised that I could potentially
> act maliciously and modify the packages and  no-one would notice - we
> introduced source provenance check first and reproducible packages after
> that realisation.
> >
> > Or maybe I am 

Re: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread sebb
WARNING: this post mixes public and private lists.

In Commons reviewers are supposed to check that the source tarball
contents all match files from the tag in the vote.
The reason is mainly for provenance, and licensing, but it is not
unknown for spurious files to be accidentally added to the source
tarball.
Things can go wrong even without malicious intent.

So I agree that this is a vital part of the process, and should be
made explicit.

On Tue, 2 Apr 2024 at 07:52, Jarek Potiuk  wrote:
>
> Following some of the learnings from the CVE-2024-3094 (xz backdoor) and a 
> few resulting discussions. I would like to start a discussion on that very 
> specific topic:
>
> TL;DR; I think that we currently do not explicitly state the requirement of 
> verifying if the release manager has not tampered with the sources when 
> preparing the source package - and I believe we should be more explicit about 
> it and require from PMC members to do such verification.
>
> As explained in [1] - there were two important triggers for the CVE to happen:
>
> a) the attacker was able to gain trust, become a maintainer and release 
> manager
> b) they submitted test binaries to the repository of xv that contained 
> malicious code
> c) acting as release manager - they modified the official source tar-ball 
> packages of xz to contain a modified Makefile that turn anyone using official 
> source-tar-ball packages to produce a malicious version of the xz library 
> (that malicious Makefile had never been part of the source repository, it's 
> not been reviewed nor approved by anyone).
>
> When I look at requirements explained in our release policy, I think this  
> kind of scenario (especially point c) is not something our release policies 
> protect us against, because we have no requirement to check provenance of the 
> source code in the released package.
>
> Or at least I cannot find it in neither release policy [2] nor distribution 
> policy [3].
>
> From [2]:
>
> > Before casting +1 binding votes, individuals are REQUIRED to download all 
> > signed source code packages onto their own hardware, verify that they meet 
> > all requirements of ASF policy on releases as described below, validate all 
> > cryptographic signatures, compile as provided, and test the result on their 
> > own platform.
>
> Even if we assume such a check is part of "meet all requirements of ASF 
> policy on releases" - there is no "check if the sources in the package have 
> not been modified vs. source repository" anywhere in the policies as far as I 
> can see.
>
> From earlier discussions - many of us think that verifying whether the 
> sources in the "source" package contain the same sources as ones stored in 
> our source repositories is the most important part of such verification, but 
> - somewhat to my surprise - it has not been explicitly stated in our 
> policies. And I think it should be an important gate to have PMC members to 
> be REQUIRED to verify that. That could be done in whatever way is appropriate 
> for the project - it could be just comparing sources with git, or having 
> reproducible packages that PMC members can build and compare for binary 
> identity if the project supports it.
>
> But similarly to comparing cryptographic signatures, possibly we should 
> explicitly state that this should be a mandatory check. And maybe we have the 
> chance to use the CVE-2024-3094 as an opportunity to remind/advocate it and 
> explain what could happen if this step is missing when our source packages 
> are released?
>
> I think the way it's stated, a malicious release manager could do a similar 
> package modification as xz release manager did and we could have missed it. 
> Our policies on release do not have explicit gates protecting against this - 
> so PMC members explicitly give +1 - following the release policy pretty 
> rigorously, could have not realise the malicious release manager did such
>
> Just to give an example from the past Airflow releases - when I came to the 
> project, I've learned how it works, and the release process was very 
> rigorously followed, including licences, signatures, etc. and we even pulled 
> a few releases when those were not met. But it's only a few years later when 
> I became a release manager and realised that I could potentially act 
> maliciously and modify the packages and  no-one would notice - we 
> introduced source provenance check first and reproducible packages after that 
> realisation.
>
> Or maybe I am exaggerating, and it's "obvious enough" that we do not have to 
> state it?
>
> I'd love to hear what others think here. And I am happy to provide a concrete 
> proposal to the policy and do some advocacy for it, if others think it is 
> needed as well.
>
> J.
>
>
> [1] https://boehs.org/node/everything-i-know-about-the-xz-backdoor
> [2] https://www.apache.org/legal/release-policy.html
> [3] https://infra.apache.org/release-distribution.html#sigs-and-sums
>


[DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread Jarek Potiuk
Following some of the learnings from the CVE-2024-3094 (xz backdoor) and a
few resulting discussions. I would like to start a discussion on that very
specific topic:

TL;DR; I think that we currently do not explicitly state the requirement of
verifying if the release manager has not tampered with the sources when
preparing the source package - and I believe we should be more explicit
about it and require from PMC members to do such verification.

As explained in [1] - there were two important triggers for the CVE to
happen:

a) the attacker was able to gain trust, become a maintainer and release
manager
b) they submitted test binaries to the repository of xv that contained
malicious code
c) acting as release manager - they modified the official source tar-ball
packages of xz to contain a modified Makefile that turn anyone using
official source-tar-ball packages to produce a malicious version of the xz
library (that malicious Makefile had never been part of the source
repository, it's not been reviewed nor approved by anyone).

When I look at requirements explained in our release policy, I think this
 kind of scenario (especially point c) is not something our release
policies protect us against, because we have no requirement to check
provenance of the source code in the released package.

Or at least I cannot find it in neither release policy [2] nor distribution
policy [3].

>From [2]:

> Before casting +1 binding votes, individuals are REQUIRED to download all
signed source code packages onto their own hardware, verify that they meet
all requirements of ASF policy on releases as described below, validate all
cryptographic signatures, compile as provided, and test the result on their
own platform.

Even if we assume such a check is part of "meet all requirements of ASF
policy on releases" - there is no "check if the sources in the package have
not been modified vs. source repository" anywhere in the policies as far as
I can see.

>From earlier discussions - many of us think that verifying whether the
sources in the "source" package contain the same sources as ones stored in
our source repositories is the most important part of such verification,
but - somewhat to my surprise - it has not been explicitly stated in our
policies. And I think it should be an important gate to have PMC members to
be REQUIRED to verify that. That could be done in whatever way is
appropriate for the project - it could be just comparing sources with git,
or having reproducible packages that PMC members can build and compare for
binary identity if the project supports it.

But similarly to comparing cryptographic signatures, possibly we should
explicitly state that this should be a mandatory check. And maybe we have
the chance to use the CVE-2024-3094 as an opportunity to remind/advocate it
and explain what could happen if this step is missing when our source
packages are released?

I think the way it's stated, a malicious release manager could do a similar
package modification as xz release manager did and we could have missed it.
Our policies on release do not have explicit gates protecting against this
- so PMC members explicitly give +1 - following the release policy pretty
rigorously, could have not realise the malicious release manager did such

Just to give an example from the past Airflow releases - when I came to the
project, I've learned how it works, and the release process was very
rigorously followed, including licences, signatures, etc. and we even
pulled a few releases when those were not met. But it's only a few years
later when I became a release manager and realised that I could potentially
act maliciously and modify the packages and  no-one would notice - we
introduced source provenance check first and reproducible packages after
that realisation.

Or maybe I am exaggerating, and it's "obvious enough" that we do not have
to state it?

I'd love to hear what others think here. And I am happy to provide a
concrete proposal to the policy and do some advocacy for it, if others
think it is needed as well.

J.


[1] https://boehs.org/node/everything-i-know-about-the-xz-backdoor
[2] https://www.apache.org/legal/release-policy.html
[3] https://infra.apache.org/release-distribution.html#sigs-and-sums