Bastian Blank <wa...@debian.org> writes: > On Mon, Oct 21, 2019 at 09:29:05PM -0700, Russ Allbery wrote:
>> If we're going to go to the trouble of defining a new source format, >> I'd prefer we embrace a VCS-based one rather than once again rolling >> our own idiosyncratic representation of a tree of files > I'm not completely sure what you mean with "VCS-based". You want to add > a complete repository (dump) to the source? Do we need to define a > subformat for each VCS then? CVS, SVN, GIT, just to name some used > ones. In any case we would be defining our own representation anyway, > because each VCS behaves different. I think it's safe at this point to just use Git. It's the dominant VCS by far and seems likely to remain so for the apparent future. We'll need to have some mechanism to generate a simple Git tree from source packages in some other format, but that's not really a problem. Right now, we convert 100% of non-native packages to a different format than their original. If we used Git as a VCS format, we would be closer to most of our upstreams and the difference for non-Git upstreams doesn't seem too significant. If at some point something else takes over from Git, we could always switch to a 5.0 format. > Also this would negate all the things we've accomplished on > reproducibilty of source packages. That seems excessively pessimistic. What about Git makes you think it's impossible to create a reproducible source package? > We never shipped history as part of our source. Was this asked for? We've always shipped one version of history as part of our source. That's a large part of the point of separate upstream and Debian tarballs. With the addition of quilt, we ship even more history in the form of the patch sequence. And yes, this has been repeatedly requested and wanted by the project going all the way back to Joey Hess's original proposal for the 3.0 (git) source package format. I think that was at least ten years ago? Those of us who wanted it then haven't stopped wanting it. > dpkg currently supports "3.0 (git)" as format, however it was never > accepted by the archive. To be clear, many of us would be happily using it right now. It wasn't accepted by the archive because the archive team vetoed it. > There is a reason for that, as this would force license reviews not only > on the current state, but on the history as well. We would also just > distribute arbitrary information we don't actually need to ship to > hundred of unrelated mirror people and would bring them into jeopardy. > If something really problematic slips in, we also would be forced to > remove all intermediate versions, because they ship the history. I understand the concerns with shipping *all* of the history, and I think we'll need to get somewhat creative about what history to include and what history to elide if we still have concerns about non-free elements sneaking into the archive via history (which I'm dubious about, but see below). But Git has mechanisms to handle this (shallow clones, for instance) that still preserve some of the utility of having a native Git package. > I think we are talking about different things. I'm talking about the > source we _must_ provide to fulfil several licenses and our own policy. > If we save them in the form of snapshots.d.o for example, we have a > complete history of the releases. > You are talking about the detailed history, a history that might not > even be accurate, as it can be changed retrospectively. To be clear, I think including the history is just one of many advantages to basing the source format on Git. The overall advantage is that for many packages the Debian source package becomes a familiar construct, rather than some idiosyncratic invention of Debian, that can be manipulated with standard tools and that is far closer to something that one can immediately start hacking on. This has huge benefits even if we ship only a shallow clone with only one revision of history. It has more benefits if we can include history, of course. It also more clearly unblocks releasing via pushing signed tags, which is the way that many, if not most by total number (if not by significance), free software packages do releases these days, thus lowering the barrier to entry for people packaging for Debian and again standardizing on common tools. > Just think about what would happen if a contributor adds code he must > not distribute for whatever reason. Another contributor finds this and > removes it before any release happens. This shows up some time later > and we get angry mails or letters stating we ship stuff we must not. So > we now need to purge this information everywhere, even if it was never > inside a release. How do you plan to deal with this problem with Salsa right now? Can't the archive use the same mechanism that Salsa would? There are also plenty of packages where the risk of this happening seems low and where the Debian package maintainer might want to accept the risk of possibly having to throw out history in the future (most native packages, for instance, or packages where the Debian package maintainer is also upstream). > I think I understand what you mean and we have different goals. > I want to modify how we ship the source we _must_ ship, where we don't > have the option not to. Just make the handling of it less painfull, > without sacrifice too many things we currently have. > You want to ship more info in immutable form. Info that have the > abbility to bite us, the whole project, and many other people, just by > distributing it. > I hope this makes the reasons clear, why I proposed what I did and not > further. I'm disappointed that the archive team seems to be refusing to engage with goals that many of us have been asking for over the past ten years. Revising the source format without supporting tag2upload and native VCS representations of packages strikes me as a waste of everyone's time and will mean that we'll need a 5.0 format in the near future that does support those goals. -- Russ Allbery (r...@debian.org) <https://www.eyrie.org/~eagle/>