Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

Ken Rockot Fri, 09 Jun 2023 11:38:41 -0700

On Fri, Jun 9, 2023 at 11:25 AM Stefan Zager <[email protected]> wrote:


> Resurrecting this thread, because yotha@ are launching the first step in
> a project that completely
> ignores the previous consensus (i.e., that we should *not* enable threaded
> compositing for all
> web_tests).
>

This is exciting to hear! Is there a good place / bug to follow progress?


>
> I will paraphrase the high-level arguments made previously by enne@ and
> ojan@, neither of whom
> are still active on chromium:
>
> - Enabling threaded compositing will increase test flakiness
> - Previous efforts to fix the flakiness have been really difficult and
> largely unsuccessful
> - Buggy behavior in the blink/compositor thread boundary is best covered
> by focused unit tests
> - There are vanishingly few bugs in that area; maybe none?
>
> My reponse...
>
> web_tests, by their very nature, are a massive set of integration tests.
> That is not the most efficient
> organization of test coverage. But the beauty of web_tests is that they
> catch a *ton* of bugs and
> prevent many bad patches from landing. The large majority of the bugs are
> in core blink areas,
> but not all of them. web_tests provide effective, if inefficient, coverage
> of almost the entire chromium
> code base. The lack of coverage for threaded compositing is an anomaly and
> a blink spot. "It's too
> hard" is not, in my opinion, a convincing justification.
>
> Yes, turning on threaded compositing *does* make web_tests flakier. And
> that is evidence of real bugs!
> It's oxymoronic to acknowledge the flakiness but claim that there are few
> or no bugs in the
> blink/compositor thread boundary. There are definitely bugs in there! They
> can be excruciatingly hard
> to root-cause, but they definitely exist. A couple of examples...
>
> https://chromium-review.googlesource.com/c/chromium/src/+/3958364
> https://bugs.chromium.org/p/chromium/issues/detail?id=1447868
>
> Once these bugs are root-caused, I certainly agree that we should prefer
> unit tests for regression
> testing. But they key point is that we only discovered these bugs because
> they caused failures in
> web_tests.
>
> Again: these are real bugs!
>

+1000


>
> Because bugs in the thread boundary are usually race conditions they can
> be extremely difficult to
> reproduce, which means that they're very likely under-reported. But they
> still degrade user experience.
> Beyond crbug.com and flaky web_tests, there are also crash reports that
> very likely indicate bugs in this
> area.
>
>
> On Wednesday, November 4, 2015 at 4:58:29 PM UTC-8 Ojan Vafai wrote:
>
>> +1 on all points. I hope this new (currently half-baked) solution will
>> help improve all three of these to varying degrees. It's not panacea, but
>> it should be a lot better.
>>
>>
>> On Wed, Nov 4, 2015 at 11:29 AM Adrienne Walker <[email protected]>
>> wrote:
>>
>>> Re: interoperability tests.  Looking at the guidelines for the web
>>> platform tests, I think if all of our layout tests had been written with
>>> the guidelines of short and minimal in mind, we would be in a much better
>>> situation than we are today.
>>>
>>> My biggest concerns about layout tests are: (1) they get used to test
>>> very internal parts of Blink, which then get exposed via awful plumbing via
>>> internals or dumping the state of the world into a text file.  (2) they
>>> "over test", especially in terms of pixel and text output, which adds a
>>> rebaselining burden as these things churn over time and over different
>>> platforms, even though that doesn't affect what the test was trying to
>>> verify.  (3) they rely on so much of the stack that they end up being flaky.
>>>
>>> I think there's plenty of room for html / javascript tests that are
>>> cross-platform and test web platform features that don't fall into these
>>> traps.
>>>
>>> 2015-11-04 9:32 GMT-08:00 Rick Byers <[email protected]>:
>>>
>>>> I agree with this big picture direction - i.e. unit tests are generally
>>>> better than integration tests.
>>>>
>>>> However I HAVE seen examples in input where we've missed issues because
>>>> we had only unit tests (content, cc and blink layout tests), not good
>>>> integration test that included the effect of threaded input handling.  One
>>>> case in particular was pretty embarrassing - IIRC we broke scrolling
>>>> completely in most real-world cases yet not a single test failed :-).
>>>> Still, I think this is, at best, an argument for having a couple sanity
>>>> integration tests (eg. as content browser tests).
>>>>
>>>> My bigger concern here is around interoperability testing.  IMHO the
>>>> only way we're going to substantially reduce developer pain around
>>>> interoperability is by sharing more tests with the other rendering
>>>> engines.  Mozilla is much better at this
>>>> <https://wiki.mozilla.org/Auto-tools/Projects/web-platform-tests> than
>>>> us.  This, by its nature, is impossible to do with unit tests.  So when new
>>>> simple web platform features are being added (especially highly
>>>> deterministic JS APIs), I've recently started encouraging my team to try
>>>> focusing on web-platform-tests <http://testthewebforward.org>.
>>>> Obviously more complex / non-deterministic features should still prefer to
>>>> focus on unit tests.  But the exact balance here seems very non-obvious to
>>>> me.  But like with everything, we should probably just try things, evaluate
>>>> how we're doing, and iterate.  Any concerns?
>>>>
>>>> Rick
>>>>
>>>>
>>>> On Wed, Nov 4, 2015 at 11:39 AM, Ojan Vafai <[email protected]> wrote:
>>>>
>>>>> Thread hijacking this to blink-dev@chromium.
>>>>>
>>>>> At a high level, I want to +1 that we should be moving our testing
>>>>> away from high level regression tests into more deterministic tests.
>>>>> There's a good (not quite finished) alternative to layout tests that I
>>>>> think we should pursue (an evolution of Elliott's C++-based Sim 
>>>>> framework).
>>>>>
>>>>> I'll be adding this to the Blink infra team's long-term roadmap
>>>>> shortly. Coincidentally, Elliott and I hashed out a basic idea yesterday.
>>>>> At some point we'll turn that into a doc and a proposal, but mostly it
>>>>> needs people to do the engineering work to make it possible. So, if you're
>>>>> possibly interested in 5%-80%ing on this, ping me.
>>>>>
>>>>> On Tue, Nov 3, 2015 at 4:04 PM 'Adrienne Walker' via paint-dev <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> (I sent this previously as an email reply to a few people about a
>>>>>> particular patch, but as this continues to come up on a monthly basis, 
>>>>>> here
>>>>>> is a slightly edited copy of this rant as a reference of my biased 
>>>>>> position
>>>>>> on this topic.)
>>>>>>
>>>>>> We should *not* make layout tests use threaded compositing.
>>>>>>
>>>>>>
>>>>>> We shouldn't do it because it's a rat hole.  jamesr and I both tried
>>>>>> to make this happen on separate occasions.  We spent a lot of time and
>>>>>> fought with even more flakiness.  I'm sure somebody could heroically 
>>>>>> burn a
>>>>>> lot of time making this happen and succeed where we couldn't, but in my
>>>>>> opinion, it's a waste of time.  Threaded compositing increases
>>>>>> flakiness and loses determinism in terms of when events happen, how
>>>>>> repaints line up with frames, and how animation happens, especially.  We
>>>>>> already have a ton of problems with flakiness in layout tests, so why 
>>>>>> make
>>>>>> this worse?
>>>>>>
>>>>>> If there are features that are only enabled with threaded compositing
>>>>>>  (compositor animations?) that somehow still need Blink layout
>>>>>> tests, we should enable those features for non-threaded compositing.
>>>>>> I would support removing the virtual/threaded directory if somebody
>>>>>> made it possible to test those features in other ways.  If there are 
>>>>>> tests
>>>>>> that have different behavior in threaded mode vs not threaded mode, then
>>>>>> they need to be deflaked or tested more narrowly via unit tests or
>>>>>> esprehn's simulation testing framework.
>>>>>>
>>>>>> Single vs threaded compositing is generally well-tested within cc
>>>>>> unit tests.  There should be very little difference at the Blink level, 
>>>>>> and
>>>>>> I would question the need to test it there.  We have ironed out a small
>>>>>> number of message ordering differences over time in cc.  We should 
>>>>>> consider
>>>>>> these differences bugs and add cc unit tests for them to guarantee the 
>>>>>> same
>>>>>> ordering of messages.  In the long term, I think the Blimp/Mandoline work
>>>>>> has the potential to merge single/thread proxy with an abstracted
>>>>>> glue layer between them.
>>>>>>
>>>>>> So, sure, we should test what we ship, but on the other hand every
>>>>>> test should not be an integration test.  I have have seen exactly zero 
>>>>>> bugs
>>>>>> ever that would have been caught if a layout test had been using threaded
>>>>>> compositing instead of single threaded compositing.  Really, zero.
>>>>>>
>>>>>> Personally, I think our problems as a team lie more on the "tests are
>>>>>> flaky and hard to reason about because they're integration tests" end of
>>>>>> the spectrum and far less on the "we are shipping huge hidden regressions
>>>>>> because of holes in our testing".
>>>>>>
>>>>>
>>>>> On Tue, Nov 3, 2015 at 5:17 PM Adrienne Walker <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> 2015-11-03 16:30 GMT-08:00 Alexey <[email protected]>:
>>>>>>
>>>>>>> Can we run all of the virtual/threaded tests in SingleThreadProxy
>>>>>>> <https://code.google.com/p/chromium/codesearch#chromium/src/cc/trees/single_thread_proxy.cc&l=37&gs=cpp:cc::class-SingleThreadProxy::SingleThreadProxy(cc::LayerTreeHost%2520*,%2520cc::LayerTreeHostSingleThreadClient%2520*,%2520scoped_refptr%253Cbase::SingleThreadTaskRunner%253E,%2520scoped_ptr%253Ccc::BeginFrameSource,%2520base::DefaultDeleter%253Ccc::BeginFrameSource%253E%2520%253E)@chromium/../../cc/trees/single_thread_proxy.cc%257Cdef&gsn=SingleThreadProxy&ct=xref_usages>
>>>>>>>  mode?
>>>>>>>
>>>>>>
>>>>>> Currently, no!
>>>>>>
>>>>>> There are some things (e.g. composited animations) that have behavior
>>>>>> differences in code depending on threaded vs single thread.  It seems to 
>>>>>> me
>>>>>> that these are wrinkles that could be ironed out.  It seems like there's 
>>>>>> no
>>>>>> reason we couldn't create and run composited animations in 
>>>>>> single-threaded
>>>>>> mode.
>>>>>>
>>>>>> It looks like other things have been added there over time (e.g. idle
>>>>>> callbacks), that I don't understand as well.  Naively, it seems like
>>>>>> fundamentally nothing should depend on threaded compositing.
>>>>>>
>>>>>
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "blink-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org
> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CA%2BapAgH17Koq-tO_khHzoGboAUbxxi7D65n%3DH73MxdTSJ5-PEA%40mail.gmail.com.

Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

Reply via email to