I'd say we'll deal with any CI issues discretely and individually once we have the list of driver maintainers and CI job maintainers. There may be a sense of urgency, but I don't think we need a specific time limit.
Marek On Wed, May 20, 2026, 01:36 Iago Toral <[email protected]> wrote: > El jue, 14-05-2026 a las 19:13 -0400, Marek Olšák escribió: > > Here's a more detailed description of the problem and a possible > > solution. > > > > First, the worst case scenario: A small one-line commit that’s > > correct > > and trivial causes a test failure in the CI. The maintainer of the > > affected driver is asked for help, who concludes that it’s likely a > > HW > > bug and is forwarded to the HW team of the corresponding GPU company. > > Now the management of the GPU company has to allocate staff to > > investigate the failure. 3 months later, we may have a workaround. Or > > not. > > > > Second, the scale: The CI has lots of undocumented devices with > > undocumented erratas and drivers with hacks and incomplete > > implementations. (that’s normal for any project) Any of those devices > > can fail at any time for reasons that might not make sense, and any > > of > > the drivers can fail for random reasons too. It’s not fair to ask the > > contributor to keep everything conformant at every MR. Even if the > > devices were documented with open source implementations (e.g. uarch > > specs, HDL, RTL) and well documented drivers, it’s not reasonable to > > ask the contributor to study them all. > > > > Thus, we can’t expect the contributor to be solely responsible for > > conformance of all devices at every MR in main. > > > > It’s useful to keep drivers that have regular contributors conformant > > at most commits in main, but why do we need to keep drivers without > > contributors conformant? If somebody cares about those drivers but > > not > > enough to contribute in main, they can contribute fixes during the RC > > window or on their own schedule. > > > > We need a two-tier system: > > > > Tier 1: > > - Devices are tested by the CI pre-merge. > > - A contact person is required for CI failure assessment and closure > > within a reasonable time. (if the person is on leave, a backup person > > must be available, or else the device is moved to Tier 2) > > > I think this makes sense, but we need to agree on what "reasonable > time" means to make sure everyone is on the same page. > > Iago > > > - Highly recommended: A fully functional drm-shim for each CI job > > with > > a user guide, how to print compiled shaders, etc. > > - Links to HW documentation if available. > > - If maintainers end up xfailing a significant number of failures > > regularly, the device is moved to Tier 2. (due to not using the CI to > > maintain conformance) > > > > Tier 2: > > - Pre-merge CI can’t run on the target devices / implementations. > > main > > doesn’t have to work. The quality of release branches is up to > > maintainers. The RC window can be extended. > > - Only unit tests can run per-merge, as well as any deviceless driver > > tests, like the following. > > - Optionally develop deviceless driver validation tests that verify > > driver output (shader instructions, command buffers). LLVM LIT tests > > are the perfect example - they validate all LLVM backends and prevent > > regressions without any physical devices. > > > > > > Marek > > > > On Fri, May 1, 2026 at 5:21 AM Daniel Stone <[email protected]> > > wrote: > > > > > > Hi, > > > > > > On Thu, 30 Apr 2026 at 23:34, Timur Kristóf > > > <[email protected]> wrote: > > > > On 2026. április 30., csütörtök 23:07:12 közép-európai nyári idő > > > > Marek Olšák > > > > wrote: > > > > > First of all, no contributor to shared code is required to fix > > > > > issues > > > > > in all drivers that their commit breaks. The goal is to stop > > > > > using the > > > > > pre-merge CI as a justification to force unrelated contributors > > > > > to > > > > > work on all drivers just because they are contributors. It > > > > > would be a > > > > > bit exploitative to assume that every contributor must debug > > > > > all > > > > > drivers that turn red due to a change. I think I understand > > > > > that well > > > > > because I have debugged 5+ drivers by myself in the past that > > > > > are not > > > > > my responsibility to maintain, and it does feel exploitative. > > > > > > There's a bit more nuance in this though. If one set of people is > > > breaking 17 drivers every day because they can't be bothered to do > > > the > > > basics to keep things working and just want to yolo whatever they > > > just > > > thought of into the tree, it's 'unethical' and unfair on the rest > > > of > > > the people who then spend their entire time bisecting and fixing up > > > what the others broke. (Those people then probably get accused of > > > being freeloaders and exploiting the labour of the people breaking > > > everything, because they don't get to spend any time on fun new > > > stuff, > > > given all their time is spent fixing what the others broke.) > > > > > > I think we've all taken it as axiomatic that there's a balance to > > > be > > > struck there: don't make others miserable because you can't be > > > bothered spending five minutes thinking about why your new code > > > breaks > > > existing users, but on the other hand you absolutely should expect > > > support from the relevant people to help work it out and resolve > > > it. > > > > > > I'm pretty sure no-one is suggesting ripping up that social > > > contract, > > > but we should be clear about what we mean. > > > > > > > > Therefore, we could establish that each driver/HW combo in pre- > > > > > merge > > > > > CI has the following options: > > > > > 1) a contact person for prompt CI issue resolution > > > > > 2) unconditional xfail by the author (or removal from pre-merge > > > > > CI if > > > > > logs lack the information necessary to add xfail) > > > > > > > > I think we should establish both of those, in that order. > > > > That is, if the contact person does not reply promptly, just > > > > let's add the > > > > expected failure. > > > > > > Yeah, that's a pretty obvious baseline. So far it seems to have > > > worked > > > out in the usual way (people know who works on what so it's easy to > > > ping them however), but if that's not working out, maybe someone > > > could > > > suggest a more formal document along the lines of MAINTAINERS or > > > CODEOWNERS or whatever? > > > > > > Cheers, > > > Daniel > > > >
