Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

Steve Kobes Fri, 09 Jun 2023 17:43:14 -0700

I think the strongest argument for threaded web tests is that web tests are
the foundation for WPT.  We now prefer
<https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_tests.md#:~:text=the%20preferred%20form%20of%20web%20tests>
web
tests to be WPT when possible, and wpt.fyi runs against the full browser.
So WPT tests need to work with threaded compositing, and it's harder to
ensure this if Chrome infra runs them differently.  We do have a large
legacy of non-WPT web tests, but we shouldn't let that hold us back from
moving toward threaded-by-default.


SimTests are better for poking at Chrome-specific internal state.  Now that
SimTest has a real InputHandlerProxy
<https://chromium.googlesource.com/chromium/src/+/dfd84d6c040d64dd5fa8906ad6e82befaf18281b>
it
is not as limited by the blink/compositor boundary and can fill some of the
gap left by single-threaded web tests.

On Fri, Jun 9, 2023 at 6:28 PM Stefan Zager <[email protected]> wrote:

> On Fri, Jun 9, 2023 at 3:07 PM Elliott Sprehn <[email protected]> wrote:
>
>> Woah 2015 super necro thread!
>>
>> There were a couple meta angles here:
>>
>> - Explicitly testing non-web exposed behavior is probably better done
>> with a SimTest or other unit test. In the past folks kept adding more
>> methods to Internals, or special behavior to the compositor, to make it
>> easier to write web_tests. So not having the threaded compositor
>> discourages some classes of that. I see folks are still adding new
>> internals methods unfortunately, but I still think SimTests (or something
>> else) is a better option.
>>
>
> Agreed: SimTests are *always* a better option, if you can get the right
> coverage. The problem we've had is that SimTests test blink right up to the
> blink/compositor boundary; and cc::LayerTreeTest test right up to the same
> boundary, from the other side. But we don't have a unit test framework for
> the renderer process that tests across that boundary. It's been proposed in
> the past, and it's still probably a good idea, to connect these two test
> frameworks so we can get better unit test coverage across the boundary. But
> that only helps for new tests; we still miss out on the coverage provided
> by the huge corpus of existing web_tests.
>
>
>> - The more "complete" the browser run over web_tests, the more bugs are
>> caught, but (at least at the time) browser_tests were very flaky and there
>> was a strong desire to avoid that fate within blink. For example you could
>> run the full chrome binary for web_tests and catch more bugs too. Why stop
>> at just the threaded compositor? The thinking was making web_tests very
>> stable was better than catching a small number of real bugs at the cost of
>> a flake tax across the hundreds of contributors.
>>
>
> Yep, very true: destabilizing the whole system is not worth it. Our first
> experiments with enabling threaded compositing by default were pretty
> broken, but we discovered a few fixes that greatly reduced the number of
> new flakes and failures, and we discovered that a relatively small number
> of tests were responsible for just about all of the new flakiness. So we
> placed those tests in quarantine: they will continue to run with the
> single-threaded compositor, while the large majority of tests (>95%), and
> all new tests, will use the threaded compositor. The best time to fix a
> flaky test is right when the test is added, and that holds true even if
> threaded compositing is the root cause of the flakiness.
>
>
>> The failure mode I've seen of efforts like this is the
>> SingleThreadedTests file grows each time someone finds some flake, so we
>> never really find the bugs like 1447868 because folks just keep hiding them
>> each time something unusual happens.
>>
>> All that said, if you can crank up the "realism" of the test runner and
>> not introduce flake and convince folks not to hide all the bugs with
>> exceptions, that sounds worth it. I hope you all succeed!
>>
>> - E
>>
>> On Fri, Jun 9, 2023 at 3:18 PM Stefan Zager <[email protected]> wrote:
>>
>>> On Fri, Jun 9, 2023 at 11:38 AM Ken Rockot <[email protected]> wrote:
>>>
>>>>
>>>> On Fri, Jun 9, 2023 at 11:25 AM Stefan Zager <[email protected]>
>>>> wrote:
>>>>
>>>>> Resurrecting this thread, because yotha@ are launching the first step
>>>>> in a project that completely
>>>>> ignores the previous consensus (i.e., that we should *not* enable
>>>>> threaded compositing for all
>>>>> web_tests).
>>>>>
>>>>
>>>> This is exciting to hear! Is there a good place / bug to follow
>>>> progress?
>>>>
>>>
>>> Main tracking bug <http://crbug.com/770028>.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> I will paraphrase the high-level arguments made previously by enne@
>>>>> and ojan@, neither of whom
>>>>> are still active on chromium:
>>>>>
>>>>> - Enabling threaded compositing will increase test flakiness
>>>>> - Previous efforts to fix the flakiness have been really difficult and
>>>>> largely unsuccessful
>>>>> - Buggy behavior in the blink/compositor thread boundary is best
>>>>> covered by focused unit tests
>>>>> - There are vanishingly few bugs in that area; maybe none?
>>>>>
>>>>> My reponse...
>>>>>
>>>>> web_tests, by their very nature, are a massive set of integration
>>>>> tests. That is not the most efficient
>>>>> organization of test coverage. But the beauty of web_tests is that
>>>>> they catch a *ton* of bugs and
>>>>> prevent many bad patches from landing. The large majority of the bugs
>>>>> are in core blink areas,
>>>>> but not all of them. web_tests provide effective, if inefficient,
>>>>> coverage of almost the entire chromium
>>>>> code base. The lack of coverage for threaded compositing is an anomaly
>>>>> and a blink spot. "It's too
>>>>> hard" is not, in my opinion, a convincing justification.
>>>>>
>>>>> Yes, turning on threaded compositing *does* make web_tests flakier.
>>>>> And that is evidence of real bugs!
>>>>> It's oxymoronic to acknowledge the flakiness but claim that there are
>>>>> few or no bugs in the
>>>>> blink/compositor thread boundary. There are definitely bugs in there!
>>>>> They can be excruciatingly hard
>>>>> to root-cause, but they definitely exist. A couple of examples...
>>>>>
>>>>> https://chromium-review.googlesource.com/c/chromium/src/+/3958364
>>>>> https://bugs.chromium.org/p/chromium/issues/detail?id=1447868
>>>>>
>>>>> Once these bugs are root-caused, I certainly agree that we should
>>>>> prefer unit tests for regression
>>>>> testing. But they key point is that we only discovered these bugs
>>>>> because they caused failures in
>>>>> web_tests.
>>>>>
>>>>> Again: these are real bugs!
>>>>>
>>>>
>>>> +1000
>>>>
>>>>
>>>>>
>>>>> Because bugs in the thread boundary are usually race conditions they
>>>>> can be extremely difficult to
>>>>> reproduce, which means that they're very likely under-reported. But
>>>>> they still degrade user experience.
>>>>> Beyond crbug.com and flaky web_tests, there are also crash reports
>>>>> that very likely indicate bugs in this
>>>>> area.
>>>>>
>>>>>
>>>>> On Wednesday, November 4, 2015 at 4:58:29 PM UTC-8 Ojan Vafai wrote:
>>>>>
>>>>>> +1 on all points. I hope this new (currently half-baked) solution
>>>>>> will help improve all three of these to varying degrees. It's not 
>>>>>> panacea,
>>>>>> but it should be a lot better.
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 4, 2015 at 11:29 AM Adrienne Walker <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Re: interoperability tests.  Looking at the guidelines for the web
>>>>>>> platform tests, I think if all of our layout tests had been written with
>>>>>>> the guidelines of short and minimal in mind, we would be in a much 
>>>>>>> better
>>>>>>> situation than we are today.
>>>>>>>
>>>>>>> My biggest concerns about layout tests are: (1) they get used to
>>>>>>> test very internal parts of Blink, which then get exposed via awful
>>>>>>> plumbing via internals or dumping the state of the world into a text 
>>>>>>> file.
>>>>>>>  (2) they "over test", especially in terms of pixel and text output, 
>>>>>>> which
>>>>>>> adds a rebaselining burden as these things churn over time and over
>>>>>>> different platforms, even though that doesn't affect what the test was
>>>>>>> trying to verify.  (3) they rely on so much of the stack that they end 
>>>>>>> up
>>>>>>> being flaky.
>>>>>>>
>>>>>>> I think there's plenty of room for html / javascript tests that are
>>>>>>> cross-platform and test web platform features that don't fall into these
>>>>>>> traps.
>>>>>>>
>>>>>>> 2015-11-04 9:32 GMT-08:00 Rick Byers <[email protected]>:
>>>>>>>
>>>>>>>> I agree with this big picture direction - i.e. unit tests are
>>>>>>>> generally better than integration tests.
>>>>>>>>
>>>>>>>> However I HAVE seen examples in input where we've missed issues
>>>>>>>> because we had only unit tests (content, cc and blink layout tests), 
>>>>>>>> not
>>>>>>>> good integration test that included the effect of threaded input 
>>>>>>>> handling.
>>>>>>>> One case in particular was pretty embarrassing - IIRC we broke 
>>>>>>>> scrolling
>>>>>>>> completely in most real-world cases yet not a single test failed :-).
>>>>>>>> Still, I think this is, at best, an argument for having a couple sanity
>>>>>>>> integration tests (eg. as content browser tests).
>>>>>>>>
>>>>>>>> My bigger concern here is around interoperability testing.  IMHO
>>>>>>>> the only way we're going to substantially reduce developer pain around
>>>>>>>> interoperability is by sharing more tests with the other rendering
>>>>>>>> engines.  Mozilla is much better at this
>>>>>>>> <https://wiki.mozilla.org/Auto-tools/Projects/web-platform-tests>
>>>>>>>> than us.  This, by its nature, is impossible to do with unit tests.  So
>>>>>>>> when new simple web platform features are being added (especially 
>>>>>>>> highly
>>>>>>>> deterministic JS APIs), I've recently started encouraging my team to 
>>>>>>>> try
>>>>>>>> focusing on web-platform-tests <http://testthewebforward.org>.
>>>>>>>> Obviously more complex / non-deterministic features should still 
>>>>>>>> prefer to
>>>>>>>> focus on unit tests.  But the exact balance here seems very 
>>>>>>>> non-obvious to
>>>>>>>> me.  But like with everything, we should probably just try things, 
>>>>>>>> evaluate
>>>>>>>> how we're doing, and iterate.  Any concerns?
>>>>>>>>
>>>>>>>> Rick
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 4, 2015 at 11:39 AM, Ojan Vafai <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thread hijacking this to blink-dev@chromium.
>>>>>>>>>
>>>>>>>>> At a high level, I want to +1 that we should be moving our testing
>>>>>>>>> away from high level regression tests into more deterministic tests.
>>>>>>>>> There's a good (not quite finished) alternative to layout tests that I
>>>>>>>>> think we should pursue (an evolution of Elliott's C++-based Sim 
>>>>>>>>> framework).
>>>>>>>>>
>>>>>>>>> I'll be adding this to the Blink infra team's long-term roadmap
>>>>>>>>> shortly. Coincidentally, Elliott and I hashed out a basic idea 
>>>>>>>>> yesterday.
>>>>>>>>> At some point we'll turn that into a doc and a proposal, but mostly it
>>>>>>>>> needs people to do the engineering work to make it possible. So, if 
>>>>>>>>> you're
>>>>>>>>> possibly interested in 5%-80%ing on this, ping me.
>>>>>>>>>
>>>>>>>>> On Tue, Nov 3, 2015 at 4:04 PM 'Adrienne Walker' via paint-dev <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> (I sent this previously as an email reply to a few people about a
>>>>>>>>>> particular patch, but as this continues to come up on a monthly 
>>>>>>>>>> basis, here
>>>>>>>>>> is a slightly edited copy of this rant as a reference of my biased 
>>>>>>>>>> position
>>>>>>>>>> on this topic.)
>>>>>>>>>>
>>>>>>>>>> We should *not* make layout tests use threaded compositing.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We shouldn't do it because it's a rat hole.  jamesr and I both
>>>>>>>>>> tried to make this happen on separate occasions.  We spent a lot of 
>>>>>>>>>> time
>>>>>>>>>> and fought with even more flakiness.  I'm sure somebody could 
>>>>>>>>>> heroically
>>>>>>>>>> burn a lot of time making this happen and succeed where we couldn't, 
>>>>>>>>>> but in
>>>>>>>>>> my opinion, it's a waste of time.  Threaded compositing increases
>>>>>>>>>> flakiness and loses determinism in terms of when events happen, how
>>>>>>>>>> repaints line up with frames, and how animation happens, especially. 
>>>>>>>>>>  We
>>>>>>>>>> already have a ton of problems with flakiness in layout tests, so 
>>>>>>>>>> why make
>>>>>>>>>> this worse?
>>>>>>>>>>
>>>>>>>>>> If there are features that are only enabled with threaded
>>>>>>>>>> compositing (compositor animations?) that somehow still need
>>>>>>>>>> Blink layout tests, we should enable those features for non-
>>>>>>>>>> threaded compositing.  I would support removing the virtual/
>>>>>>>>>> threaded directory if somebody made it possible to test those
>>>>>>>>>> features in other ways.  If there are tests that have different 
>>>>>>>>>> behavior in
>>>>>>>>>> threaded mode vs not threaded mode, then they need to be deflaked or 
>>>>>>>>>> tested
>>>>>>>>>> more narrowly via unit tests or esprehn's simulation testing 
>>>>>>>>>> framework.
>>>>>>>>>>
>>>>>>>>>> Single vs threaded compositing is generally well-tested within
>>>>>>>>>> cc unit tests.  There should be very little difference at the Blink 
>>>>>>>>>> level,
>>>>>>>>>> and I would question the need to test it there.  We have ironed out 
>>>>>>>>>> a small
>>>>>>>>>> number of message ordering differences over time in cc.  We should 
>>>>>>>>>> consider
>>>>>>>>>> these differences bugs and add cc unit tests for them to guarantee 
>>>>>>>>>> the same
>>>>>>>>>> ordering of messages.  In the long term, I think the Blimp/Mandoline 
>>>>>>>>>> work
>>>>>>>>>> has the potential to merge single/thread proxy with an
>>>>>>>>>> abstracted glue layer between them.
>>>>>>>>>>
>>>>>>>>>> So, sure, we should test what we ship, but on the other hand
>>>>>>>>>> every test should not be an integration test.  I have have seen 
>>>>>>>>>> exactly
>>>>>>>>>> zero bugs ever that would have been caught if a layout test had been 
>>>>>>>>>> using
>>>>>>>>>> threaded compositing instead of single threaded compositing.  
>>>>>>>>>> Really, zero.
>>>>>>>>>>
>>>>>>>>>> Personally, I think our problems as a team lie more on the "tests
>>>>>>>>>> are flaky and hard to reason about because they're integration 
>>>>>>>>>> tests" end
>>>>>>>>>> of the spectrum and far less on the "we are shipping huge hidden
>>>>>>>>>> regressions because of holes in our testing".
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Nov 3, 2015 at 5:17 PM Adrienne Walker <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> 2015-11-03 16:30 GMT-08:00 Alexey <[email protected]>:
>>>>>>>>>>
>>>>>>>>>>> Can we run all of the virtual/threaded tests in
>>>>>>>>>>> SingleThreadProxy
>>>>>>>>>>> <https://code.google.com/p/chromium/codesearch#chromium/src/cc/trees/single_thread_proxy.cc&l=37&gs=cpp:cc::class-SingleThreadProxy::SingleThreadProxy(cc::LayerTreeHost%2520*,%2520cc::LayerTreeHostSingleThreadClient%2520*,%2520scoped_refptr%253Cbase::SingleThreadTaskRunner%253E,%2520scoped_ptr%253Ccc::BeginFrameSource,%2520base::DefaultDeleter%253Ccc::BeginFrameSource%253E%2520%253E)@chromium/../../cc/trees/single_thread_proxy.cc%257Cdef&gsn=SingleThreadProxy&ct=xref_usages>
>>>>>>>>>>>  mode?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Currently, no!
>>>>>>>>>>
>>>>>>>>>> There are some things (e.g. composited animations) that have
>>>>>>>>>> behavior differences in code depending on threaded vs single thread. 
>>>>>>>>>>  It
>>>>>>>>>> seems to me that these are wrinkles that could be ironed out.  It 
>>>>>>>>>> seems
>>>>>>>>>> like there's no reason we couldn't create and run composited 
>>>>>>>>>> animations in
>>>>>>>>>> single-threaded mode.
>>>>>>>>>>
>>>>>>>>>> It looks like other things have been added there over time (e.g.
>>>>>>>>>> idle callbacks), that I don't understand as well.  Naively, it seems 
>>>>>>>>>> like
>>>>>>>>>> fundamentally nothing should depend on threaded compositing.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "blink-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org
>>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
> You received this message because you are subscribed to the Google Groups
> "blink-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHOQ7J-0kFrERxC_PZdyfA6yoh_53%2Bu50gm_vuS-hzd1pAXUTg%40mail.gmail.com
> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHOQ7J-0kFrERxC_PZdyfA6yoh_53%2Bu50gm_vuS-hzd1pAXUTg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAAjAU%3DYYb29VBshp_ESN21Zb5ehfPA-oWRkRo24ESu5vRRndUw%40mail.gmail.com.

Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

Reply via email to