Re: musings on rust packaging [was Re: F38 proposal: RPM Sequoia (System-Wide Change proposal)]

Fabio Valentini Wed, 19 Oct 2022 04:05:12 -0700

On Wed, Oct 19, 2022 at 11:25 AM Matthew Miller
<mat...@fedoraproject.org> wrote:
>
> I _very much_ appreciate all the work you and the other Rust SIG folks
> (Igor and Zbyszek in particular but I'm sure others as well!) have put into
> packaging rust apps and crates and all of the systems around that.


I'll respond inline.

> I fundamentally disagree with Kevin on a deep level about "entirely
> useless", but ... find myself kind of agreeing about the "unpackagable"
> part. I mean: clearly we've found a way, but I'm really not sure we're
> providing a lot of _value_ in this approach, and I'm also not sure it's as
> successful as it could be.

We *do* provide value to both users *and* developers by doing things
the way we do, but the benefits might not be obvious to people who
don't know how (Rust) packaging works, and what we as package
maintainers do.

> There are three ways having things packaged in Fedora repos _can_ be
> helpful:
>
> 1. End-user applications and tools
> 2. Useful development environment
> 3. As convenience for ourselves for building packages for #1 or #2
>
> I am not discounting the value of #3 -- making a shared thing that we all
> work on together is kind of the whole point, and the nicer we can make that
> the better we can bring in more people, and those of us already here have a
> lighter load and can work on the things we're most interested in. But
> ultimately, we're doing it so we make a useful system for users. That means
> the first two.

This I can agree with :)

> I'll start with the second: our system for Rust doesn't really do that.
> Developers are going to use cargo and crates.io and we're not going to
> convince them that they should do otherwise. (I don't expect anyone
> disagrees with this.)

This is true, and probably also not "fixable". We need to make some
amount of non-upstreamable patches to some crates (most notably,
removing Windows- or mac OS-specific dependencies, because we don't
want to package those), but in some cases, these are "incompatible"
changes, and Rust *developers* should not be targeting our downstream
sources that have these differences with actual upstream sources.

This is due to a limitation of how cargo handles target-specific
dependencies - all dependencies that are *mentioned in any way* need
to be *present* for it to resolve dependencies / enabled optional
features / update its lockfile etc. But since we don't want to package
bindings for Windows and mac OS system APIs, we need to actually patch
them out, otherwise builds will fail.

> We're doing okay with #1, but... I think #3 _even_ with all of the work in
> Rust-to-RPM packaging isn't sufficient. I've played with the Bevy game
> engine and will probably have a few things it would be nice to package to
> make available in Fedora Linux. I might not even mind maintaining Bevy
> itself.

Somebody actually already started packaging Bevy components - some
packages are already approved and some are still pending review. Not
sure what the progress has been there, but it's not *impossible*.

> But running `cargo fetch` with a clean cache pulls down *390* crates. Of
> these, it looks like 199 (!) are already packaged as rust-[crate]-devel,
> which is *amazing*. But... that still is hundreds that I'd have to add. And
> mostly they are things I don't know _anything_ about.

You must realize that this is an extreme case. For many Rust
applications that people want to package for Fedora, the number of
dependencies that are missing is rather small, *because* most popular
libraries are already packaged.

Bevy is a bit special, because it (presumably) pulls in lots of GPU /
OpenGL / Vulkan related libraries, which we didn't need to package for
anything else yet, and it's also split into dozens of small libraries
itself, which can be painful to package, that is true.

We might need to reconsider how to package projects like this. I'm
pretty sure we could find a way to package them in a way that's
compatible with how we're currently doing things but would be much
less busywork.

> *This is what open source winning looks like.*
>
> I remember a Byte magazine article from the 1990 (I just checked!) with the
> title "There Is a Silver Bullet: The birth of interchangeable, reusable
> software components will bring software into the information age". [1]
> This was about the newly-hot idea of Object Oriented Programming. It was
> very exciting. But, of course, that vision of the world did not happen. It
> turns out proprietary software *can't* do this.
>
> But now we have it! I don't have to reinvent every basic wheel — but even
> more than that, I do not have to be an expert in the intricacies of safe
> concurrency to write an app that uses that under the hood. That's amazing! I
> can do such powerful things from high-level interfaces and trust the
> expertise of those who really understand the deep computer science some of
> this requires.
>
> I am competent enough to write a silly toy game using Bevy. It might be good
> enough that others will enjoy it. *I am not competent to maintain many of
> these dependencies.* I don't even know what most of them DO. "anyhow"?
> "bytemuck"?

Sure, but isn't that the case for most projects that a newcomer wants
to package, regardless of programming language? Say, somebody wants to
package some cool new Python project for machine learning, then
there's probably also some linear algebra package or SIMD math library
in the dependency tree that's missing from Fedora. How is that
different?

> Worse, many of the Bevy deps are specified with exact versions. Maybe I
> could make the package work with the packaged versions, but ... that
> requires deep expertise and even then might lead to unexpected behavior and
> has a high chance of putting me at odds with both the engine upstream and
> any other games which use it.

For intra-project dependencies (i.e. bevy components depending on
exact versions of bevy components), this is kind of expected, and we
have tools to deal with this kind of situation (though bevy is on a
different scale). For dependencies on third-party libraries, this is
kind of unexpected, and I wonder why they do things like that? Locking
some dependencies to exact versions is usually handled by relying on
the lockfile, instead.

Still, for large projects that are split into many crates like bevy, I
think we could find a way to make packaging them a lot easier (i.e.
not use one source package per crate, but use one source package for
building a project that contains many crates).

> The packaging guidelines say that I SHOULD
> create patches to update to latest versions of dependencies, and that I
> should further convince the upstream to take them. Candidly, that seems like
> a waste of everone's time.

This is *not* a waste of time.
If we don't invest time to do that, many project's dependencies grow
stale, and actually *increase* the need for us to maintain compat
packages.

> The guidelines provide for creating compat packages, but that means 1) the
> existing shared work is less useful, 2) requires even more extra steps, and
> 3) even without reviews for compat has extra administrative overhead.

We only maintain compat packages where porting to the new version (and
submitting the changes upstream) is not feasible. Again, isn't that
how Fedora is supposed to work?

> So, going back to Kevin's point: it _does_ feel like this is unpackagable.
> But that's because the barrier to participation seems too high. It's not
> because it's statically-linked binaries [2] can't or shouldn't exist in
> Fedora — that's not one of our core principles! In fact and quite to the
> contrary, we need to adapt to handle this amazing open source success story
> better.

The barrier for participation is too high in some cases, I agree.
However, in my experience, that's for a different reason:

The "shiny new things that happen to be written in Rust" that new
contributors want to have in Fedora are often very complicated
projects that even experienced Rust packagers would need to spend a
lot of time on.

Examples of that might be:
- wasmtime: I ultimately abandoned the attempt to package it "because
Fedora Legal", but the packages themselves worked fine
- deno: requires dozens of new packages, some of which also have
unclear / questionable licenses as well, but the packages themselves
worked

On the other hand, many "nice" CLI tools that people want to package
often require minimal knowledge of Rust packaging (our tools are
pretty nice for "standard" projects), and often only need very few new
dependencies to be packaged.

Just as an example, I just today started reviewing a "simple" Rust
application here:
https://bugzilla.redhat.com/show_bug.cgi?id=1990713

The spec file is very simple and almost entirely automatically
generated (with the exception of the missing License breakdown for the
statically linked binary), no dependencies were missing from Fedora.
Even Rust newbies would not have trouble packaging this, and that
would be a way better entry point than packaging stuff like Bevy.

> And, I led with: I appreciate all the work you've all done to make this
> work. That's definitely true — I think it was super-valuable to pilot this
> approach. But I think that the Rust ecosystem would be a great place to
> pilot a different way. Something lightweight where we cache crates and use
> them _directly_ in the build process for _application_ RPMs.

We have talked about this multiple times, but it won't work.
I think this was tried with first-class maven artifact support in
koji, but we all know how the Java packaging fiasco ended.

Or even if making Rust crates first-class deliverables *did work*, it
wouldn't give us the benefits of the current approach:
- we ensure that all crates in Fedora *build* on all architectures
- we ensure that most crates in Fedora pass their test suites on all
architectures
- we check all crates for objectionable content, licensing problems, etc.
- we change build flags to default to dynamically linking to system
libraries instead of statically linking against vendored copies

This would mean that we basically stop contributing things to the
upstream Rust ecosystem:
- we diagnose / report / fix architecture support issues
- we port projects to new versions of dependencies
- etc.

I see this work in the upstream ecosystem as an important part of the
work we do in packaging Rust crates for Fedora,
and I would not want to endorse an approach that meant we no longer do
these things.

> Rust packages include a lot of machine-readable metadata. We should be able
> to watch for CVEs, RustSec, and other security notices even without encoding
> the metadata in RPMs. License review could also be automated — the field in
> Cargo.toml is supposed to be SPDX, so that's convenient. [3]

I already monitor RustSec advisories and check *all of them* against
Fedora packages. This takes up a miniscule amount of the time I spend
on Rust packaging (because there's so few Rust security advisories).
If I remember correctly, there were only 2-3 CVE issues in the Rust
stack that actually affected our packages, and dealing with those was
very simple:
1) Push the patched version of the library, 2) rebuild dependent
applications, 3) submit to bodhi.
There's some amount of automation that *could* be done (mostly in
figuring out which applications need to be rebuilt for a given library
change), but that's also pretty easily done with a "dnf repoquery" or
two.

On the other hand, license review is still important, even if it's
already available in SPDX format in the upstream metadata.
Just because sometimes, that metadata is either wrong or incomplete.
And even more often, package review flags other problems (like missing
LICENSE files for licenses that *require* redistributed sources to
contain a copy of the license text). Relying on SPDX metadata alone is
*not* safe.

> We could also attach other metadata to the packages in the cache. Maybe some
> popularity, update frequency from Cargo.io, but also package review flags:
> checked license against source, and whatever other auditing we think should
> be done. This moves the focus from specfile-correctness to the package
> itself, and the effort from packaging to reviewing. (I'd suggest that for
> the experiment, we not make any deep auditing manditory, but instead
> encouraged.) And these flags should be able to be added by anyone in the
> Rust SIG, not necessarily just at import.

This is already the case, though?
Writing a spec file for a new crate is already automated to the point
where "standard" crates can be 100% automatically generated and need
zero manual edits.
If manual changes *are* required, then these changes would also be
required in the "first-class crate artifact" scenario, so you don't
gain anything.
And if there's other problems that are caught during package review,
the distribution mechanism doesn't matter, either.

In my experience, changing the distribution mechanism or packaging
paradigm will often make things *worse* instead of better. For
example, the implosion of the NodeJS package ecosystem in Fedora was
not only caused by the horrid state NPM, but also because the new
packaging guidelines which prefer bundling essentially made it
impossible for packagers to verify that objectionable content is
present in vendored dependencies. For Java, Modularity was seen as a
"solution", but the result was that basically everybody - except for
the Red Hat maintainers who maintained the modules - just stopped
doing Java packaging because of the hostile environment.

> Maybe we could get involved in Cargo Vet [4] — we could be both a consumer
> _and_ a data source.
>
> Fedora _needs_ to adapt to stay relevant in the world where every language
> stack has developed a packaging ecosystem which effectively ignores us. Some
> of them are missing lessons they could have learned, ah well — but they also
> have a lot of nice new ideas we're missing. And, no matter what we think,
> we're clearly not going to stop them.
>
> Rust packaging seems like a great place to lead the way — and then we can
> maybe expand to Go, which has similar issues, and then Java (where, you
> know, things have already collapsed despite heroic effort.)

Oh, actually, I don't think Rust packaging is a good place to start
here at all. :)

The way cargo works already maps very neatly onto how RPM packages
work, which is definitely *not* the case for other language
ecosystems. I also think we could even massively improve handling of
"large" projects with many sub-components (like bevy, zola, wasmtime,
deno, etc.) - which are currently the only projects that are "painful"
to package - *without* completely changing the underlying packaging
paradigm or distribution mechanism. (I've been wanting to actually
write better tooling for this use case, but alas, Bachelor thesis is
more important for now.)

Given that, I think we're actually in kind of a *good* situation with
Rust packaging, especially compared to other language ecosystems - not
only right now, but also looking at the future. And looking at the
alternatives, all attempts at trying different approaches (maven
artifacts in koji, vendoring NodeJS dependencies, Java Modules, etc.)
have *failed* and ultimately made things worse instead of improving
the situation - the only thing that has proven to be sustainable (for
now) is ... maybe surprisingly, plain RPM packages.

Fabio
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Re: musings on rust packaging [was Re: F38 proposal: RPM Sequoia (System-Wide Change proposal)]

Reply via email to