On Thu, May 14, 2026 at 4:24 PM Marek Olšák <[email protected]> wrote: > > Here's a more detailed description of the problem and a possible solution. > > First, the worst case scenario: A small one-line commit that’s correct > and trivial causes a test failure in the CI. The maintainer of the > affected driver is asked for help, who concludes that it’s likely a HW > bug and is forwarded to the HW team of the corresponding GPU company. > Now the management of the GPU company has to allocate staff to > investigate the failure. 3 months later, we may have a workaround. Or > not. > > Second, the scale: The CI has lots of undocumented devices with > undocumented erratas and drivers with hacks and incomplete > implementations. (that’s normal for any project) Any of those devices > can fail at any time for reasons that might not make sense, and any of > the drivers can fail for random reasons too. It’s not fair to ask the > contributor to keep everything conformant at every MR. Even if the > devices were documented with open source implementations (e.g. uarch > specs, HDL, RTL) and well documented drivers, it’s not reasonable to > ask the contributor to study them all. > > Thus, we can’t expect the contributor to be solely responsible for > conformance of all devices at every MR in main. > > It’s useful to keep drivers that have regular contributors conformant > at most commits in main, but why do we need to keep drivers without > contributors conformant? If somebody cares about those drivers but not > enough to contribute in main, they can contribute fixes during the RC > window or on their own schedule. > > We need a two-tier system: > > Tier 1: > - Devices are tested by the CI pre-merge. > - A contact person is required for CI failure assessment and closure > within a reasonable time. (if the person is on leave, a backup person > must be available, or else the device is moved to Tier 2) > - Highly recommended: A fully functional drm-shim for each CI job with > a user guide, how to print compiled shaders, etc. > - Links to HW documentation if available. > - If maintainers end up xfailing a significant number of failures > regularly, the device is moved to Tier 2. (due to not using the CI to > maintain conformance)
I generally agree.. but will add that even if, for whatever reason, I didn't have time to immediately completely debug some random fail that shows up in CI on someone else's MR, it is still useful to have the xfail added in the same MR. This way I don't have to later bisect to re-discover where the failure started happening and spend time remembering the context. In this case, I'd a-b adding the xfail and follow up when I could. But I'd be ok with a general rule that if there is no $driver_maintainer feedback in N days, just add xfail and move on (for some value of N... 3 to 5?). BR, -R > > Tier 2: > - Pre-merge CI can’t run on the target devices / implementations. main > doesn’t have to work. The quality of release branches is up to > maintainers. The RC window can be extended. > - Only unit tests can run per-merge, as well as any deviceless driver > tests, like the following. > - Optionally develop deviceless driver validation tests that verify > driver output (shader instructions, command buffers). LLVM LIT tests > are the perfect example - they validate all LLVM backends and prevent > regressions without any physical devices. > > > Marek > > On Fri, May 1, 2026 at 5:21 AM Daniel Stone <[email protected]> wrote: > > > > Hi, > > > > On Thu, 30 Apr 2026 at 23:34, Timur Kristóf <[email protected]> wrote: > > > On 2026. április 30., csütörtök 23:07:12 közép-európai nyári idő Marek > > > Olšák > > > wrote: > > > > First of all, no contributor to shared code is required to fix issues > > > > in all drivers that their commit breaks. The goal is to stop using the > > > > pre-merge CI as a justification to force unrelated contributors to > > > > work on all drivers just because they are contributors. It would be a > > > > bit exploitative to assume that every contributor must debug all > > > > drivers that turn red due to a change. I think I understand that well > > > > because I have debugged 5+ drivers by myself in the past that are not > > > > my responsibility to maintain, and it does feel exploitative. > > > > There's a bit more nuance in this though. If one set of people is > > breaking 17 drivers every day because they can't be bothered to do the > > basics to keep things working and just want to yolo whatever they just > > thought of into the tree, it's 'unethical' and unfair on the rest of > > the people who then spend their entire time bisecting and fixing up > > what the others broke. (Those people then probably get accused of > > being freeloaders and exploiting the labour of the people breaking > > everything, because they don't get to spend any time on fun new stuff, > > given all their time is spent fixing what the others broke.) > > > > I think we've all taken it as axiomatic that there's a balance to be > > struck there: don't make others miserable because you can't be > > bothered spending five minutes thinking about why your new code breaks > > existing users, but on the other hand you absolutely should expect > > support from the relevant people to help work it out and resolve it. > > > > I'm pretty sure no-one is suggesting ripping up that social contract, > > but we should be clear about what we mean. > > > > > > Therefore, we could establish that each driver/HW combo in pre-merge > > > > CI has the following options: > > > > 1) a contact person for prompt CI issue resolution > > > > 2) unconditional xfail by the author (or removal from pre-merge CI if > > > > logs lack the information necessary to add xfail) > > > > > > I think we should establish both of those, in that order. > > > That is, if the contact person does not reply promptly, just let's add the > > > expected failure. > > > > Yeah, that's a pretty obvious baseline. So far it seems to have worked > > out in the usual way (people know who works on what so it's easy to > > ping them however), but if that's not working out, maybe someone could > > suggest a more formal document along the lines of MAINTAINERS or > > CODEOWNERS or whatever? > > > > Cheers, > > Daniel
