On Fri, Jun 9, 2023 at 3:07 PM Elliott Sprehn <[email protected]> wrote:
> Woah 2015 super necro thread! > > There were a couple meta angles here: > > - Explicitly testing non-web exposed behavior is probably better done with > a SimTest or other unit test. In the past folks kept adding more methods to > Internals, or special behavior to the compositor, to make it easier to > write web_tests. So not having the threaded compositor discourages some > classes of that. I see folks are still adding new internals methods > unfortunately, but I still think SimTests (or something else) is a > better option. > Agreed: SimTests are *always* a better option, if you can get the right coverage. The problem we've had is that SimTests test blink right up to the blink/compositor boundary; and cc::LayerTreeTest test right up to the same boundary, from the other side. But we don't have a unit test framework for the renderer process that tests across that boundary. It's been proposed in the past, and it's still probably a good idea, to connect these two test frameworks so we can get better unit test coverage across the boundary. But that only helps for new tests; we still miss out on the coverage provided by the huge corpus of existing web_tests. > - The more "complete" the browser run over web_tests, the more bugs are > caught, but (at least at the time) browser_tests were very flaky and there > was a strong desire to avoid that fate within blink. For example you could > run the full chrome binary for web_tests and catch more bugs too. Why stop > at just the threaded compositor? The thinking was making web_tests very > stable was better than catching a small number of real bugs at the cost of > a flake tax across the hundreds of contributors. > Yep, very true: destabilizing the whole system is not worth it. Our first experiments with enabling threaded compositing by default were pretty broken, but we discovered a few fixes that greatly reduced the number of new flakes and failures, and we discovered that a relatively small number of tests were responsible for just about all of the new flakiness. So we placed those tests in quarantine: they will continue to run with the single-threaded compositor, while the large majority of tests (>95%), and all new tests, will use the threaded compositor. The best time to fix a flaky test is right when the test is added, and that holds true even if threaded compositing is the root cause of the flakiness. > The failure mode I've seen of efforts like this is the SingleThreadedTests > file grows each time someone finds some flake, so we never really find the > bugs like 1447868 because folks just keep hiding them each time something > unusual happens. > > All that said, if you can crank up the "realism" of the test runner and > not introduce flake and convince folks not to hide all the bugs with > exceptions, that sounds worth it. I hope you all succeed! > > - E > > On Fri, Jun 9, 2023 at 3:18 PM Stefan Zager <[email protected]> wrote: > >> On Fri, Jun 9, 2023 at 11:38 AM Ken Rockot <[email protected]> wrote: >> >>> >>> On Fri, Jun 9, 2023 at 11:25 AM Stefan Zager <[email protected]> >>> wrote: >>> >>>> Resurrecting this thread, because yotha@ are launching the first step >>>> in a project that completely >>>> ignores the previous consensus (i.e., that we should *not* enable >>>> threaded compositing for all >>>> web_tests). >>>> >>> >>> This is exciting to hear! Is there a good place / bug to follow progress? >>> >> >> Main tracking bug <http://crbug.com/770028>. >> >> >>> >>> >>>> >>>> I will paraphrase the high-level arguments made previously by enne@ >>>> and ojan@, neither of whom >>>> are still active on chromium: >>>> >>>> - Enabling threaded compositing will increase test flakiness >>>> - Previous efforts to fix the flakiness have been really difficult and >>>> largely unsuccessful >>>> - Buggy behavior in the blink/compositor thread boundary is best >>>> covered by focused unit tests >>>> - There are vanishingly few bugs in that area; maybe none? >>>> >>>> My reponse... >>>> >>>> web_tests, by their very nature, are a massive set of integration >>>> tests. That is not the most efficient >>>> organization of test coverage. But the beauty of web_tests is that they >>>> catch a *ton* of bugs and >>>> prevent many bad patches from landing. The large majority of the bugs >>>> are in core blink areas, >>>> but not all of them. web_tests provide effective, if inefficient, >>>> coverage of almost the entire chromium >>>> code base. The lack of coverage for threaded compositing is an anomaly >>>> and a blink spot. "It's too >>>> hard" is not, in my opinion, a convincing justification. >>>> >>>> Yes, turning on threaded compositing *does* make web_tests flakier. And >>>> that is evidence of real bugs! >>>> It's oxymoronic to acknowledge the flakiness but claim that there are >>>> few or no bugs in the >>>> blink/compositor thread boundary. There are definitely bugs in there! >>>> They can be excruciatingly hard >>>> to root-cause, but they definitely exist. A couple of examples... >>>> >>>> https://chromium-review.googlesource.com/c/chromium/src/+/3958364 >>>> https://bugs.chromium.org/p/chromium/issues/detail?id=1447868 >>>> >>>> Once these bugs are root-caused, I certainly agree that we should >>>> prefer unit tests for regression >>>> testing. But they key point is that we only discovered these bugs >>>> because they caused failures in >>>> web_tests. >>>> >>>> Again: these are real bugs! >>>> >>> >>> +1000 >>> >>> >>>> >>>> Because bugs in the thread boundary are usually race conditions they >>>> can be extremely difficult to >>>> reproduce, which means that they're very likely under-reported. But >>>> they still degrade user experience. >>>> Beyond crbug.com and flaky web_tests, there are also crash reports >>>> that very likely indicate bugs in this >>>> area. >>>> >>>> >>>> On Wednesday, November 4, 2015 at 4:58:29 PM UTC-8 Ojan Vafai wrote: >>>> >>>>> +1 on all points. I hope this new (currently half-baked) solution will >>>>> help improve all three of these to varying degrees. It's not panacea, but >>>>> it should be a lot better. >>>>> >>>>> >>>>> On Wed, Nov 4, 2015 at 11:29 AM Adrienne Walker <[email protected]> >>>>> wrote: >>>>> >>>>>> Re: interoperability tests. Looking at the guidelines for the web >>>>>> platform tests, I think if all of our layout tests had been written with >>>>>> the guidelines of short and minimal in mind, we would be in a much better >>>>>> situation than we are today. >>>>>> >>>>>> My biggest concerns about layout tests are: (1) they get used to test >>>>>> very internal parts of Blink, which then get exposed via awful plumbing >>>>>> via >>>>>> internals or dumping the state of the world into a text file. (2) they >>>>>> "over test", especially in terms of pixel and text output, which adds a >>>>>> rebaselining burden as these things churn over time and over different >>>>>> platforms, even though that doesn't affect what the test was trying to >>>>>> verify. (3) they rely on so much of the stack that they end up being >>>>>> flaky. >>>>>> >>>>>> I think there's plenty of room for html / javascript tests that are >>>>>> cross-platform and test web platform features that don't fall into these >>>>>> traps. >>>>>> >>>>>> 2015-11-04 9:32 GMT-08:00 Rick Byers <[email protected]>: >>>>>> >>>>>>> I agree with this big picture direction - i.e. unit tests are >>>>>>> generally better than integration tests. >>>>>>> >>>>>>> However I HAVE seen examples in input where we've missed issues >>>>>>> because we had only unit tests (content, cc and blink layout tests), not >>>>>>> good integration test that included the effect of threaded input >>>>>>> handling. >>>>>>> One case in particular was pretty embarrassing - IIRC we broke scrolling >>>>>>> completely in most real-world cases yet not a single test failed :-). >>>>>>> Still, I think this is, at best, an argument for having a couple sanity >>>>>>> integration tests (eg. as content browser tests). >>>>>>> >>>>>>> My bigger concern here is around interoperability testing. IMHO the >>>>>>> only way we're going to substantially reduce developer pain around >>>>>>> interoperability is by sharing more tests with the other rendering >>>>>>> engines. Mozilla is much better at this >>>>>>> <https://wiki.mozilla.org/Auto-tools/Projects/web-platform-tests> >>>>>>> than us. This, by its nature, is impossible to do with unit tests. So >>>>>>> when new simple web platform features are being added (especially highly >>>>>>> deterministic JS APIs), I've recently started encouraging my team to try >>>>>>> focusing on web-platform-tests <http://testthewebforward.org>. >>>>>>> Obviously more complex / non-deterministic features should still prefer >>>>>>> to >>>>>>> focus on unit tests. But the exact balance here seems very non-obvious >>>>>>> to >>>>>>> me. But like with everything, we should probably just try things, >>>>>>> evaluate >>>>>>> how we're doing, and iterate. Any concerns? >>>>>>> >>>>>>> Rick >>>>>>> >>>>>>> >>>>>>> On Wed, Nov 4, 2015 at 11:39 AM, Ojan Vafai <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thread hijacking this to blink-dev@chromium. >>>>>>>> >>>>>>>> At a high level, I want to +1 that we should be moving our testing >>>>>>>> away from high level regression tests into more deterministic tests. >>>>>>>> There's a good (not quite finished) alternative to layout tests that I >>>>>>>> think we should pursue (an evolution of Elliott's C++-based Sim >>>>>>>> framework). >>>>>>>> >>>>>>>> I'll be adding this to the Blink infra team's long-term roadmap >>>>>>>> shortly. Coincidentally, Elliott and I hashed out a basic idea >>>>>>>> yesterday. >>>>>>>> At some point we'll turn that into a doc and a proposal, but mostly it >>>>>>>> needs people to do the engineering work to make it possible. So, if >>>>>>>> you're >>>>>>>> possibly interested in 5%-80%ing on this, ping me. >>>>>>>> >>>>>>>> On Tue, Nov 3, 2015 at 4:04 PM 'Adrienne Walker' via paint-dev < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> (I sent this previously as an email reply to a few people about a >>>>>>>>> particular patch, but as this continues to come up on a monthly >>>>>>>>> basis, here >>>>>>>>> is a slightly edited copy of this rant as a reference of my biased >>>>>>>>> position >>>>>>>>> on this topic.) >>>>>>>>> >>>>>>>>> We should *not* make layout tests use threaded compositing. >>>>>>>>> >>>>>>>>> >>>>>>>>> We shouldn't do it because it's a rat hole. jamesr and I both >>>>>>>>> tried to make this happen on separate occasions. We spent a lot of >>>>>>>>> time >>>>>>>>> and fought with even more flakiness. I'm sure somebody could >>>>>>>>> heroically >>>>>>>>> burn a lot of time making this happen and succeed where we couldn't, >>>>>>>>> but in >>>>>>>>> my opinion, it's a waste of time. Threaded compositing increases >>>>>>>>> flakiness and loses determinism in terms of when events happen, how >>>>>>>>> repaints line up with frames, and how animation happens, especially. >>>>>>>>> We >>>>>>>>> already have a ton of problems with flakiness in layout tests, so why >>>>>>>>> make >>>>>>>>> this worse? >>>>>>>>> >>>>>>>>> If there are features that are only enabled with threaded >>>>>>>>> compositing (compositor animations?) that somehow still need >>>>>>>>> Blink layout tests, we should enable those features for non- >>>>>>>>> threaded compositing. I would support removing the virtual/ >>>>>>>>> threaded directory if somebody made it possible to test those >>>>>>>>> features in other ways. If there are tests that have different >>>>>>>>> behavior in >>>>>>>>> threaded mode vs not threaded mode, then they need to be deflaked or >>>>>>>>> tested >>>>>>>>> more narrowly via unit tests or esprehn's simulation testing >>>>>>>>> framework. >>>>>>>>> >>>>>>>>> Single vs threaded compositing is generally well-tested within cc >>>>>>>>> unit tests. There should be very little difference at the Blink >>>>>>>>> level, and >>>>>>>>> I would question the need to test it there. We have ironed out a >>>>>>>>> small >>>>>>>>> number of message ordering differences over time in cc. We should >>>>>>>>> consider >>>>>>>>> these differences bugs and add cc unit tests for them to guarantee >>>>>>>>> the same >>>>>>>>> ordering of messages. In the long term, I think the Blimp/Mandoline >>>>>>>>> work >>>>>>>>> has the potential to merge single/thread proxy with an abstracted >>>>>>>>> glue layer between them. >>>>>>>>> >>>>>>>>> So, sure, we should test what we ship, but on the other hand every >>>>>>>>> test should not be an integration test. I have have seen exactly >>>>>>>>> zero bugs >>>>>>>>> ever that would have been caught if a layout test had been using >>>>>>>>> threaded >>>>>>>>> compositing instead of single threaded compositing. Really, zero. >>>>>>>>> >>>>>>>>> Personally, I think our problems as a team lie more on the "tests >>>>>>>>> are flaky and hard to reason about because they're integration tests" >>>>>>>>> end >>>>>>>>> of the spectrum and far less on the "we are shipping huge hidden >>>>>>>>> regressions because of holes in our testing". >>>>>>>>> >>>>>>>> >>>>>>>> On Tue, Nov 3, 2015 at 5:17 PM Adrienne Walker <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> 2015-11-03 16:30 GMT-08:00 Alexey <[email protected]>: >>>>>>>>> >>>>>>>>>> Can we run all of the virtual/threaded tests in SingleThreadProxy >>>>>>>>>> <https://code.google.com/p/chromium/codesearch#chromium/src/cc/trees/single_thread_proxy.cc&l=37&gs=cpp:cc::class-SingleThreadProxy::SingleThreadProxy(cc::LayerTreeHost%2520*,%2520cc::LayerTreeHostSingleThreadClient%2520*,%2520scoped_refptr%253Cbase::SingleThreadTaskRunner%253E,%2520scoped_ptr%253Ccc::BeginFrameSource,%2520base::DefaultDeleter%253Ccc::BeginFrameSource%253E%2520%253E)@chromium/../../cc/trees/single_thread_proxy.cc%257Cdef&gsn=SingleThreadProxy&ct=xref_usages> >>>>>>>>>> mode? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Currently, no! >>>>>>>>> >>>>>>>>> There are some things (e.g. composited animations) that have >>>>>>>>> behavior differences in code depending on threaded vs single thread. >>>>>>>>> It >>>>>>>>> seems to me that these are wrinkles that could be ironed out. It >>>>>>>>> seems >>>>>>>>> like there's no reason we couldn't create and run composited >>>>>>>>> animations in >>>>>>>>> single-threaded mode. >>>>>>>>> >>>>>>>>> It looks like other things have been added there over time (e.g. >>>>>>>>> idle callbacks), that I don't understand as well. Naively, it seems >>>>>>>>> like >>>>>>>>> fundamentally nothing should depend on threaded compositing. >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "blink-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org >>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHOQ7J-0kFrERxC_PZdyfA6yoh_53%2Bu50gm_vuS-hzd1pAXUTg%40mail.gmail.com.
