Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

'Elliott Sprehn' via blink-dev Fri, 09 Jun 2023 15:08:52 -0700

Woah 2015 super necro thread!

There were a couple meta angles here:


- Explicitly testing non-web exposed behavior is probably better done with
a SimTest or other unit test. In the past folks kept adding more methods to
Internals, or special behavior to the compositor, to make it easier to
write web_tests. So not having the threaded compositor discourages some
classes of that. I see folks are still adding new internals methods
unfortunately, but I still think SimTests (or something else) is a
better option.

- The more "complete" the browser run over web_tests, the more bugs are
caught, but (at least at the time) browser_tests were very flaky and there
was a strong desire to avoid that fate within blink. For example you could
run the full chrome binary for web_tests and catch more bugs too. Why stop
at just the threaded compositor? The thinking was making web_tests very
stable was better than catching a small number of real bugs at the cost of
a flake tax across the hundreds of contributors.

The failure mode I've seen of efforts like this is the SingleThreadedTests
file grows each time someone finds some flake, so we never really find the
bugs like 1447868 because folks just keep hiding them each time something
unusual happens.

All that said, if you can crank up the "realism" of the test runner and not
introduce flake and convince folks not to hide all the bugs with
exceptions, that sounds worth it. I hope you all succeed!

- E

On Fri, Jun 9, 2023 at 3:18 PM Stefan Zager <[email protected]> wrote:

> On Fri, Jun 9, 2023 at 11:38 AM Ken Rockot <[email protected]> wrote:
>
>>
>> On Fri, Jun 9, 2023 at 11:25 AM Stefan Zager <[email protected]> wrote:
>>
>>> Resurrecting this thread, because yotha@ are launching the first step
>>> in a project that completely
>>> ignores the previous consensus (i.e., that we should *not* enable
>>> threaded compositing for all
>>> web_tests).
>>>
>>
>> This is exciting to hear! Is there a good place / bug to follow progress?
>>
>
> Main tracking bug <http://crbug.com/770028>.
>
>
>>
>>
>>>
>>> I will paraphrase the high-level arguments made previously by enne@ and
>>> ojan@, neither of whom
>>> are still active on chromium:
>>>
>>> - Enabling threaded compositing will increase test flakiness
>>> - Previous efforts to fix the flakiness have been really difficult and
>>> largely unsuccessful
>>> - Buggy behavior in the blink/compositor thread boundary is best covered
>>> by focused unit tests
>>> - There are vanishingly few bugs in that area; maybe none?
>>>
>>> My reponse...
>>>
>>> web_tests, by their very nature, are a massive set of integration tests.
>>> That is not the most efficient
>>> organization of test coverage. But the beauty of web_tests is that they
>>> catch a *ton* of bugs and
>>> prevent many bad patches from landing. The large majority of the bugs
>>> are in core blink areas,
>>> but not all of them. web_tests provide effective, if inefficient,
>>> coverage of almost the entire chromium
>>> code base. The lack of coverage for threaded compositing is an anomaly
>>> and a blink spot. "It's too
>>> hard" is not, in my opinion, a convincing justification.
>>>
>>> Yes, turning on threaded compositing *does* make web_tests flakier. And
>>> that is evidence of real bugs!
>>> It's oxymoronic to acknowledge the flakiness but claim that there are
>>> few or no bugs in the
>>> blink/compositor thread boundary. There are definitely bugs in there!
>>> They can be excruciatingly hard
>>> to root-cause, but they definitely exist. A couple of examples...
>>>
>>> https://chromium-review.googlesource.com/c/chromium/src/+/3958364
>>> https://bugs.chromium.org/p/chromium/issues/detail?id=1447868
>>>
>>> Once these bugs are root-caused, I certainly agree that we should prefer
>>> unit tests for regression
>>> testing. But they key point is that we only discovered these bugs
>>> because they caused failures in
>>> web_tests.
>>>
>>> Again: these are real bugs!
>>>
>>
>> +1000
>>
>>
>>>
>>> Because bugs in the thread boundary are usually race conditions they can
>>> be extremely difficult to
>>> reproduce, which means that they're very likely under-reported. But they
>>> still degrade user experience.
>>> Beyond crbug.com and flaky web_tests, there are also crash reports that
>>> very likely indicate bugs in this
>>> area.
>>>
>>>
>>> On Wednesday, November 4, 2015 at 4:58:29 PM UTC-8 Ojan Vafai wrote:
>>>
>>>> +1 on all points. I hope this new (currently half-baked) solution will
>>>> help improve all three of these to varying degrees. It's not panacea, but
>>>> it should be a lot better.
>>>>
>>>>
>>>> On Wed, Nov 4, 2015 at 11:29 AM Adrienne Walker <[email protected]>
>>>> wrote:
>>>>
>>>>> Re: interoperability tests.  Looking at the guidelines for the web
>>>>> platform tests, I think if all of our layout tests had been written with
>>>>> the guidelines of short and minimal in mind, we would be in a much better
>>>>> situation than we are today.
>>>>>
>>>>> My biggest concerns about layout tests are: (1) they get used to test
>>>>> very internal parts of Blink, which then get exposed via awful plumbing 
>>>>> via
>>>>> internals or dumping the state of the world into a text file.  (2) they
>>>>> "over test", especially in terms of pixel and text output, which adds a
>>>>> rebaselining burden as these things churn over time and over different
>>>>> platforms, even though that doesn't affect what the test was trying to
>>>>> verify.  (3) they rely on so much of the stack that they end up being 
>>>>> flaky.
>>>>>
>>>>> I think there's plenty of room for html / javascript tests that are
>>>>> cross-platform and test web platform features that don't fall into these
>>>>> traps.
>>>>>
>>>>> 2015-11-04 9:32 GMT-08:00 Rick Byers <[email protected]>:
>>>>>
>>>>>> I agree with this big picture direction - i.e. unit tests are
>>>>>> generally better than integration tests.
>>>>>>
>>>>>> However I HAVE seen examples in input where we've missed issues
>>>>>> because we had only unit tests (content, cc and blink layout tests), not
>>>>>> good integration test that included the effect of threaded input 
>>>>>> handling.
>>>>>> One case in particular was pretty embarrassing - IIRC we broke scrolling
>>>>>> completely in most real-world cases yet not a single test failed :-).
>>>>>> Still, I think this is, at best, an argument for having a couple sanity
>>>>>> integration tests (eg. as content browser tests).
>>>>>>
>>>>>> My bigger concern here is around interoperability testing.  IMHO the
>>>>>> only way we're going to substantially reduce developer pain around
>>>>>> interoperability is by sharing more tests with the other rendering
>>>>>> engines.  Mozilla is much better at this
>>>>>> <https://wiki.mozilla.org/Auto-tools/Projects/web-platform-tests>
>>>>>> than us.  This, by its nature, is impossible to do with unit tests.  So
>>>>>> when new simple web platform features are being added (especially highly
>>>>>> deterministic JS APIs), I've recently started encouraging my team to try
>>>>>> focusing on web-platform-tests <http://testthewebforward.org>.
>>>>>> Obviously more complex / non-deterministic features should still prefer 
>>>>>> to
>>>>>> focus on unit tests.  But the exact balance here seems very non-obvious 
>>>>>> to
>>>>>> me.  But like with everything, we should probably just try things, 
>>>>>> evaluate
>>>>>> how we're doing, and iterate.  Any concerns?
>>>>>>
>>>>>> Rick
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 4, 2015 at 11:39 AM, Ojan Vafai <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thread hijacking this to blink-dev@chromium.
>>>>>>>
>>>>>>> At a high level, I want to +1 that we should be moving our testing
>>>>>>> away from high level regression tests into more deterministic tests.
>>>>>>> There's a good (not quite finished) alternative to layout tests that I
>>>>>>> think we should pursue (an evolution of Elliott's C++-based Sim 
>>>>>>> framework).
>>>>>>>
>>>>>>> I'll be adding this to the Blink infra team's long-term roadmap
>>>>>>> shortly. Coincidentally, Elliott and I hashed out a basic idea 
>>>>>>> yesterday.
>>>>>>> At some point we'll turn that into a doc and a proposal, but mostly it
>>>>>>> needs people to do the engineering work to make it possible. So, if 
>>>>>>> you're
>>>>>>> possibly interested in 5%-80%ing on this, ping me.
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 4:04 PM 'Adrienne Walker' via paint-dev <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> (I sent this previously as an email reply to a few people about a
>>>>>>>> particular patch, but as this continues to come up on a monthly basis, 
>>>>>>>> here
>>>>>>>> is a slightly edited copy of this rant as a reference of my biased 
>>>>>>>> position
>>>>>>>> on this topic.)
>>>>>>>>
>>>>>>>> We should *not* make layout tests use threaded compositing.
>>>>>>>>
>>>>>>>>
>>>>>>>> We shouldn't do it because it's a rat hole.  jamesr and I both
>>>>>>>> tried to make this happen on separate occasions.  We spent a lot of 
>>>>>>>> time
>>>>>>>> and fought with even more flakiness.  I'm sure somebody could 
>>>>>>>> heroically
>>>>>>>> burn a lot of time making this happen and succeed where we couldn't, 
>>>>>>>> but in
>>>>>>>> my opinion, it's a waste of time.  Threaded compositing increases
>>>>>>>> flakiness and loses determinism in terms of when events happen, how
>>>>>>>> repaints line up with frames, and how animation happens, especially.  
>>>>>>>> We
>>>>>>>> already have a ton of problems with flakiness in layout tests, so why 
>>>>>>>> make
>>>>>>>> this worse?
>>>>>>>>
>>>>>>>> If there are features that are only enabled with threaded
>>>>>>>> compositing (compositor animations?) that somehow still need Blink
>>>>>>>> layout tests, we should enable those features for non-threaded
>>>>>>>> compositing.  I would support removing the virtual/threaded directory
>>>>>>>> if somebody made it possible to test those features in other ways.  If
>>>>>>>> there are tests that have different behavior in threaded mode vs not
>>>>>>>> threaded mode, then they need to be deflaked or tested more narrowly 
>>>>>>>> via
>>>>>>>> unit tests or esprehn's simulation testing framework.
>>>>>>>>
>>>>>>>> Single vs threaded compositing is generally well-tested within cc
>>>>>>>> unit tests.  There should be very little difference at the Blink 
>>>>>>>> level, and
>>>>>>>> I would question the need to test it there.  We have ironed out a small
>>>>>>>> number of message ordering differences over time in cc.  We should 
>>>>>>>> consider
>>>>>>>> these differences bugs and add cc unit tests for them to guarantee the 
>>>>>>>> same
>>>>>>>> ordering of messages.  In the long term, I think the Blimp/Mandoline 
>>>>>>>> work
>>>>>>>> has the potential to merge single/thread proxy with an abstracted
>>>>>>>> glue layer between them.
>>>>>>>>
>>>>>>>> So, sure, we should test what we ship, but on the other hand every
>>>>>>>> test should not be an integration test.  I have have seen exactly zero 
>>>>>>>> bugs
>>>>>>>> ever that would have been caught if a layout test had been using 
>>>>>>>> threaded
>>>>>>>> compositing instead of single threaded compositing.  Really, zero.
>>>>>>>>
>>>>>>>> Personally, I think our problems as a team lie more on the "tests
>>>>>>>> are flaky and hard to reason about because they're integration tests" 
>>>>>>>> end
>>>>>>>> of the spectrum and far less on the "we are shipping huge hidden
>>>>>>>> regressions because of holes in our testing".
>>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 5:17 PM Adrienne Walker <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> 2015-11-03 16:30 GMT-08:00 Alexey <[email protected]>:
>>>>>>>>
>>>>>>>>> Can we run all of the virtual/threaded tests in SingleThreadProxy
>>>>>>>>> <https://code.google.com/p/chromium/codesearch#chromium/src/cc/trees/single_thread_proxy.cc&l=37&gs=cpp:cc::class-SingleThreadProxy::SingleThreadProxy(cc::LayerTreeHost%2520*,%2520cc::LayerTreeHostSingleThreadClient%2520*,%2520scoped_refptr%253Cbase::SingleThreadTaskRunner%253E,%2520scoped_ptr%253Ccc::BeginFrameSource,%2520base::DefaultDeleter%253Ccc::BeginFrameSource%253E%2520%253E)@chromium/../../cc/trees/single_thread_proxy.cc%257Cdef&gsn=SingleThreadProxy&ct=xref_usages>
>>>>>>>>>  mode?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Currently, no!
>>>>>>>>
>>>>>>>> There are some things (e.g. composited animations) that have
>>>>>>>> behavior differences in code depending on threaded vs single thread.  
>>>>>>>> It
>>>>>>>> seems to me that these are wrinkles that could be ironed out.  It seems
>>>>>>>> like there's no reason we couldn't create and run composited 
>>>>>>>> animations in
>>>>>>>> single-threaded mode.
>>>>>>>>
>>>>>>>> It looks like other things have been added there over time (e.g.
>>>>>>>> idle callbacks), that I don't understand as well.  Naively, it seems 
>>>>>>>> like
>>>>>>>> fundamentally nothing should depend on threaded compositing.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "blink-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org
>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAO9Q3iJPOPBPgSsu7f_9_6XzRg3OjmqV%3Do4AAKFS7Yjv6HQ9XQ%40mail.gmail.com.

Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

Reply via email to