Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

Stefan Zager Fri, 09 Jun 2023 11:25:14 -0700

Resurrecting this thread, because yotha@ are launching the first step in a 
project that completely
ignores the previous consensus (i.e., that we should *not* enable threaded 
compositing for all
web_tests).


I will paraphrase the high-level arguments made previously by enne@ and 
ojan@, neither of whom
are still active on chromium:

- Enabling threaded compositing will increase test flakiness
- Previous efforts to fix the flakiness have been really difficult and 
largely unsuccessful
- Buggy behavior in the blink/compositor thread boundary is best covered by 
focused unit tests
- There are vanishingly few bugs in that area; maybe none?

My reponse...

web_tests, by their very nature, are a massive set of integration tests. 
That is not the most efficient
organization of test coverage. But the beauty of web_tests is that they 
catch a *ton* of bugs and
prevent many bad patches from landing. The large majority of the bugs are 
in core blink areas,
but not all of them. web_tests provide effective, if inefficient, coverage 
of almost the entire chromium
code base. The lack of coverage for threaded compositing is an anomaly and 
a blink spot. "It's too
hard" is not, in my opinion, a convincing justification.

Yes, turning on threaded compositing *does* make web_tests flakier. And 
that is evidence of real bugs!
It's oxymoronic to acknowledge the flakiness but claim that there are few 
or no bugs in the
blink/compositor thread boundary. There are definitely bugs in there! They 
can be excruciatingly hard
to root-cause, but they definitely exist. A couple of examples...

https://chromium-review.googlesource.com/c/chromium/src/+/3958364
https://bugs.chromium.org/p/chromium/issues/detail?id=1447868

Once these bugs are root-caused, I certainly agree that we should prefer 
unit tests for regression
testing. But they key point is that we only discovered these bugs because 
they caused failures in
web_tests.

Again: these are real bugs!

Because bugs in the thread boundary are usually race conditions they can be 
extremely difficult to
reproduce, which means that they're very likely under-reported. But they 
still degrade user experience.
Beyond crbug.com and flaky web_tests, there are also crash reports that 
very likely indicate bugs in this
area.


On Wednesday, November 4, 2015 at 4:58:29 PM UTC-8 Ojan Vafai wrote:

> +1 on all points. I hope this new (currently half-baked) solution will 
> help improve all three of these to varying degrees. It's not panacea, but 
> it should be a lot better.
>
>
> On Wed, Nov 4, 2015 at 11:29 AM Adrienne Walker <[email protected]> wrote:
>
>> Re: interoperability tests.  Looking at the guidelines for the web 
>> platform tests, I think if all of our layout tests had been written with 
>> the guidelines of short and minimal in mind, we would be in a much better 
>> situation than we are today. 
>>
>> My biggest concerns about layout tests are: (1) they get used to test 
>> very internal parts of Blink, which then get exposed via awful plumbing via 
>> internals or dumping the state of the world into a text file.  (2) they 
>> "over test", especially in terms of pixel and text output, which adds a 
>> rebaselining burden as these things churn over time and over different 
>> platforms, even though that doesn't affect what the test was trying to 
>> verify.  (3) they rely on so much of the stack that they end up being flaky.
>>
>> I think there's plenty of room for html / javascript tests that are 
>> cross-platform and test web platform features that don't fall into these 
>> traps.
>>
>> 2015-11-04 9:32 GMT-08:00 Rick Byers <[email protected]>:
>>
>>> I agree with this big picture direction - i.e. unit tests are generally 
>>> better than integration tests.
>>>
>>> However I HAVE seen examples in input where we've missed issues because 
>>> we had only unit tests (content, cc and blink layout tests), not good 
>>> integration test that included the effect of threaded input handling.  One 
>>> case in particular was pretty embarrassing - IIRC we broke scrolling 
>>> completely in most real-world cases yet not a single test failed :-).  
>>> Still, I think this is, at best, an argument for having a couple sanity 
>>> integration tests (eg. as content browser tests).
>>>
>>> My bigger concern here is around interoperability testing.  IMHO the 
>>> only way we're going to substantially reduce developer pain around 
>>> interoperability is by sharing more tests with the other rendering 
>>> engines.  Mozilla is much better at this 
>>> <https://wiki.mozilla.org/Auto-tools/Projects/web-platform-tests> than 
>>> us.  This, by its nature, is impossible to do with unit tests.  So when new 
>>> simple web platform features are being added (especially highly 
>>> deterministic JS APIs), I've recently started encouraging my team to try 
>>> focusing on web-platform-tests <http://testthewebforward.org>.  
>>> Obviously more complex / non-deterministic features should still prefer to 
>>> focus on unit tests.  But the exact balance here seems very non-obvious to 
>>> me.  But like with everything, we should probably just try things, evaluate 
>>> how we're doing, and iterate.  Any concerns?
>>>
>>> Rick
>>>
>>>
>>> On Wed, Nov 4, 2015 at 11:39 AM, Ojan Vafai <[email protected]> wrote:
>>>
>>>> Thread hijacking this to blink-dev@chromium.
>>>>
>>>> At a high level, I want to +1 that we should be moving our testing away 
>>>> from high level regression tests into more deterministic tests. There's a 
>>>> good (not quite finished) alternative to layout tests that I think we 
>>>> should pursue (an evolution of Elliott's C++-based Sim framework).
>>>>
>>>> I'll be adding this to the Blink infra team's long-term roadmap 
>>>> shortly. Coincidentally, Elliott and I hashed out a basic idea yesterday. 
>>>> At some point we'll turn that into a doc and a proposal, but mostly it 
>>>> needs people to do the engineering work to make it possible. So, if you're 
>>>> possibly interested in 5%-80%ing on this, ping me.
>>>>
>>>> On Tue, Nov 3, 2015 at 4:04 PM 'Adrienne Walker' via paint-dev <
>>>> [email protected]> wrote:
>>>>
>>>>> (I sent this previously as an email reply to a few people about a 
>>>>> particular patch, but as this continues to come up on a monthly basis, 
>>>>> here 
>>>>> is a slightly edited copy of this rant as a reference of my biased 
>>>>> position 
>>>>> on this topic.)
>>>>>
>>>>> We should *not* make layout tests use threaded compositing.
>>>>>
>>>>>
>>>>> We shouldn't do it because it's a rat hole.  jamesr and I both tried 
>>>>> to make this happen on separate occasions.  We spent a lot of time and 
>>>>> fought with even more flakiness.  I'm sure somebody could heroically burn 
>>>>> a 
>>>>> lot of time making this happen and succeed where we couldn't, but in my 
>>>>> opinion, it's a waste of time.  Threaded compositing increases 
>>>>> flakiness and loses determinism in terms of when events happen, how 
>>>>> repaints line up with frames, and how animation happens, especially.  We 
>>>>> already have a ton of problems with flakiness in layout tests, so why 
>>>>> make 
>>>>> this worse?
>>>>>
>>>>> If there are features that are only enabled with threaded compositing
>>>>>  (compositor animations?) that somehow still need Blink layout tests, 
>>>>> we should enable those features for non-threaded compositing.  I 
>>>>> would support removing the virtual/threaded directory if somebody 
>>>>> made it possible to test those features in other ways.  If there are 
>>>>> tests 
>>>>> that have different behavior in threaded mode vs not threaded mode, then 
>>>>> they need to be deflaked or tested more narrowly via unit tests or 
>>>>> esprehn's simulation testing framework.
>>>>>
>>>>> Single vs threaded compositing is generally well-tested within cc 
>>>>> unit tests.  There should be very little difference at the Blink level, 
>>>>> and 
>>>>> I would question the need to test it there.  We have ironed out a small 
>>>>> number of message ordering differences over time in cc.  We should 
>>>>> consider 
>>>>> these differences bugs and add cc unit tests for them to guarantee the 
>>>>> same 
>>>>> ordering of messages.  In the long term, I think the Blimp/Mandoline work 
>>>>> has the potential to merge single/thread proxy with an abstracted 
>>>>> glue layer between them.
>>>>>
>>>>> So, sure, we should test what we ship, but on the other hand every 
>>>>> test should not be an integration test.  I have have seen exactly zero 
>>>>> bugs 
>>>>> ever that would have been caught if a layout test had been using threaded 
>>>>> compositing instead of single threaded compositing.  Really, zero.
>>>>>
>>>>> Personally, I think our problems as a team lie more on the "tests are 
>>>>> flaky and hard to reason about because they're integration tests" end of 
>>>>> the spectrum and far less on the "we are shipping huge hidden regressions 
>>>>> because of holes in our testing".
>>>>>
>>>>
>>>> On Tue, Nov 3, 2015 at 5:17 PM Adrienne Walker <[email protected]> 
>>>> wrote:
>>>>
>>>>> 2015-11-03 16:30 GMT-08:00 Alexey <[email protected]>:
>>>>>
>>>>>> Can we run all of the virtual/threaded tests in SingleThreadProxy 
>>>>>> <https://code.google.com/p/chromium/codesearch#chromium/src/cc/trees/single_thread_proxy.cc&l=37&gs=cpp:cc::class-SingleThreadProxy::SingleThreadProxy(cc::LayerTreeHost%2520*,%2520cc::LayerTreeHostSingleThreadClient%2520*,%2520scoped_refptr%253Cbase::SingleThreadTaskRunner%253E,%2520scoped_ptr%253Ccc::BeginFrameSource,%2520base::DefaultDeleter%253Ccc::BeginFrameSource%253E%2520%253E)@chromium/../../cc/trees/single_thread_proxy.cc%257Cdef&gsn=SingleThreadProxy&ct=xref_usages>
>>>>>>  mode?
>>>>>>
>>>>>
>>>>> Currently, no!
>>>>>
>>>>> There are some things (e.g. composited animations) that have behavior 
>>>>> differences in code depending on threaded vs single thread.  It seems to 
>>>>> me 
>>>>> that these are wrinkles that could be ironed out.  It seems like there's 
>>>>> no 
>>>>> reason we couldn't create and run composited animations in 
>>>>> single-threaded 
>>>>> mode.
>>>>>
>>>>> It looks like other things have been added there over time (e.g. idle 
>>>>> callbacks), that I don't understand as well.  Naively, it seems like 
>>>>> fundamentally nothing should depend on threaded compositing. 
>>>>>
>>>>
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org.

Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

Reply via email to