On 8 April 2017 at 19:29, Paul Moore <[email protected]> wrote: > On 8 April 2017 at 03:17, Nick Coghlan <[email protected]> wrote: >> The "at least one relevant tag is set" pre-requisite would be to avoid >> emitting false positives for projects that don't provide any platform >> compatibility guidance at all. > > I agree that there's little incentive at the moment to get classifiers > right.
I'll also explicitly note that I think this idea counts as a "nice to have" - in cases where there are real compatibility problems, those are going to show up at runtime anyway, so what this idea really provides is a debugging hint that says "Hey, you know that weird behaviour you're seeing in <environment>? How sure are you that all of your dependencies actually support that configuration?" That said, if folks agree that this idea at least seems plausible, one outcome is that I would abandon the draft "Supported Environments" section for the "python.constraints" extension in PEP 459: https://www.python.org/dev/peps/pep-0459/#supported-environments While programmatic expressions like that are handy for publishers, they don't convey the difference between "We expect future Python versions to work" and "We have tested this particular Python version, and it does appear to work", and they're also fairly hostile to automated data analysis, since you need to evaluate expressions in a mini-language rather than just filtering on an appropriately defined set of metadata tags. When it comes to the "Programming Language :: Python" classifiers though, we already give folks quite a bit of flexibility there: - no tag or the generic unversioned tag to say "No guidance provided" - the "PL :: Python :: X" tags to say "definitely supports Python X" without saying which X.Y versions - the "PL :: Python :: X.Y" tags to say "definitely supports Python X.Y" And that flexibility provides an opportunity to let publishers make a trade-off between precision of information provided (down to just major version, or specifying both major and minor version) and the level of maintenance effort (with the more precise approach meaning always having to make a new release to update the compatibility metadata for new Python feature releases, even when the existing code works without any changes, but also meaning you get a way to affirmatively say "Yes, we tested this with the new version, and it still works"). We also have the "PL :: Python :: X :: Only" tags, but I think that may be a misguided approach and we'd be better off with a general notion of tag negation: "Not :: PL :: Python :: X" (so you'd add a "Not :: Programming Language :: Python :: 2" tag instead of adding a "Programming Language :: Python :: 3 :: Only" tag) > So my concern with this proposal would be that it issues the > warnings to end users, who don't have any direct means of resolving > the issue (they can of course raise bugs on the projects they find > with incorrect classifiers). We need to be clear about the kinds of end users we're considering here, though: folks using pip (or similar) tools to do their own install-time software integration, *not* folks consuming pre-built and pre-integrated components through conda/apt/dnf/msi/etc. In the latter cases, the redistributor is taking on the task of making sure their particular combinations work well together, but when we use pip (et al) directly, that task falls directly on us as useful, and it's useful when debugging to know whether what we're doing is a combination that upstream has already thought about (and is hopefully covering in their CI setup if they have one), or whether we may be doing something unusual that most other people haven't tried yet. While this is also useful info for redistributors to know, I was thinking in PyPI publisher & pip user terms when the idea occurred to me. The concept is based at least in part on my experience as a World of Warcraft player, where there are two main pieces to their compatibility handling model for UI Add-ons: 1. Add-on authors tag the add-on itself with the most recent version of the client API that they've tested it against 2. To avoid having your UI break completely every time the client API changes, he main game client has a simple "Load Out of Date Addons" check box to let you opt-in to continue to use add-ons that may not have been updated for the latest changes to the game's runtime API (while also clearly saying "Don't complain to Blizzard about any UI bugs you encounter in this unsupported configuration") Assuming we do pursue this idea (which is still a big assumption at this point, due to the "potentially nice to have for debugging in some situations" motivation being a fairly weak one for volunteer efforts), I think a sensible way to go would be to have the classifier checking be opt-in initially (e.g. through a "--check-classifiers" option), and only consider making it the default behaviour if having it available as a debugging option seems insufficient. > Furthermore, there's a potential risk > that projects might see classifiers as implying a level of support > they are not happy with, and so are reluctant to add classifiers > "just" to suppress the warning. >From a client UX perspective, something like the approach used for the `--no-binary` option would seem reasonable: https://pip.pypa.io/en/stable/reference/pip_install/#cmdoption-no-binary That is: * `--check-classifiers :none:` to disable checks entirely * `--check-classifiers :all:` to check everything * `--check-classifiers a,b,c,d` to check key packages you really care about, but ignore others > But without data, the above is just FUD, so I'd suggest we do some > analysis. I did some spot checks, and it seems that projects might > typically not set the OS classifier, which alleviates my biggest > concern (projects stating "POSIX" because that's what they develop on, > when they actually work fine on Windows) - but propoer data would be > better. Two things I'd like to see: > > 1. A breakdown of how many projects actually use the various OS and > Language classifiers. > 2. Where projects ship wheels, do the wheels they ship match the > classifiers they declare? > > That should give a good idea of the immediate impact of this proposal. I think the other thing that research would provide is guidance on whether it makes more sense to create *new* tags specifically for compatibility testing reports rather than attempting to define new semantics for existing tags. The inference from existing tags would then solely be a migration step where clients and services could synthesise the new tags based on old metadata (including things like `Requires-Python:`). If we went down that path, it might look like this: 1. Two new classifier namespaces specifically for compatibility assertions: "Compatible" and "Incompatible" 2. Within each, start by defining two subnamespaces based on existing classifiers: Compatible :: Python :: [as for `Programming Language :: Python ::`] Compatible :: OS :: [as for `Operating System :: `] Incompatible :: Python :: [as for `Programming Language :: Python ::`] Incompatible :: OS :: [as for `Operating System :: `] Within the "Compatible" namespace the ` :: Only` suffix would be a modifier to strengthen the "Compatible with this" assertion into a "almost certainly not compatible with any of the other options in this category" assertion. One nice aspect of that model is that it would be readily extensible to other dimensions of compatibility, like "Implementation" (so projects that know they're tightly coupled to the C API for example can add "Compatible :: Implementation :: CPython"). The downside is that it would leave the older "for information only" classifiers as semantically ambiguous and we'd be stuck permanently with two very similar sets of classifiers. > (There's not much we can say about source-only distributions, but > that's OK). The data needed to answer those questions should be > available - the only way I have of getting it is via the JSON > interface to PyPI, so I can write a script to collect the information, > but it might be some time before I can collate it. Is this something > the BigQuery data we have (which I haven't even looked at myself) > could answer? Back when Donald and I were working on PEP 440 and ensuring the normalization scheme covered the vast majority of existing projects, we had to retrieve all the version info over XML-RPC: https://github.com/pypa/packaging/blob/master/tasks/check.py I'm not aware of any subsequent changes on that front, so I don't believe we currently push the PKG-INFO registration metadata into Big Query. However, I do believe we *could* (if Google are amenable), and if we did, it would make these kinds of research questions much easier to answer. Donald, any feedback on how hard it would be to get the current PyPI project metadata into a queryable format in BQ? Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - [email protected] https://mail.python.org/mailman/listinfo/distutils-sig
