Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

Stefan Zager Fri, 09 Jun 2023 12:18:40 -0700

On Fri, Jun 9, 2023 at 11:38 AM Ken Rockot <[email protected]> wrote:


>
> On Fri, Jun 9, 2023 at 11:25 AM Stefan Zager <[email protected]> wrote:
>
>> Resurrecting this thread, because yotha@ are launching the first step in
>> a project that completely
>> ignores the previous consensus (i.e., that we should *not* enable
>> threaded compositing for all
>> web_tests).
>>
>
> This is exciting to hear! Is there a good place / bug to follow progress?
>

Main tracking bug <http://crbug.com/770028>.


>
>
>>
>> I will paraphrase the high-level arguments made previously by enne@ and
>> ojan@, neither of whom
>> are still active on chromium:
>>
>> - Enabling threaded compositing will increase test flakiness
>> - Previous efforts to fix the flakiness have been really difficult and
>> largely unsuccessful
>> - Buggy behavior in the blink/compositor thread boundary is best covered
>> by focused unit tests
>> - There are vanishingly few bugs in that area; maybe none?
>>
>> My reponse...
>>
>> web_tests, by their very nature, are a massive set of integration tests.
>> That is not the most efficient
>> organization of test coverage. But the beauty of web_tests is that they
>> catch a *ton* of bugs and
>> prevent many bad patches from landing. The large majority of the bugs are
>> in core blink areas,
>> but not all of them. web_tests provide effective, if inefficient,
>> coverage of almost the entire chromium
>> code base. The lack of coverage for threaded compositing is an anomaly
>> and a blink spot. "It's too
>> hard" is not, in my opinion, a convincing justification.
>>
>> Yes, turning on threaded compositing *does* make web_tests flakier. And
>> that is evidence of real bugs!
>> It's oxymoronic to acknowledge the flakiness but claim that there are few
>> or no bugs in the
>> blink/compositor thread boundary. There are definitely bugs in there!
>> They can be excruciatingly hard
>> to root-cause, but they definitely exist. A couple of examples...
>>
>> https://chromium-review.googlesource.com/c/chromium/src/+/3958364
>> https://bugs.chromium.org/p/chromium/issues/detail?id=1447868
>>
>> Once these bugs are root-caused, I certainly agree that we should prefer
>> unit tests for regression
>> testing. But they key point is that we only discovered these bugs because
>> they caused failures in
>> web_tests.
>>
>> Again: these are real bugs!
>>
>
> +1000
>
>
>>
>> Because bugs in the thread boundary are usually race conditions they can
>> be extremely difficult to
>> reproduce, which means that they're very likely under-reported. But they
>> still degrade user experience.
>> Beyond crbug.com and flaky web_tests, there are also crash reports that
>> very likely indicate bugs in this
>> area.
>>
>>
>> On Wednesday, November 4, 2015 at 4:58:29 PM UTC-8 Ojan Vafai wrote:
>>
>>> +1 on all points. I hope this new (currently half-baked) solution will
>>> help improve all three of these to varying degrees. It's not panacea, but
>>> it should be a lot better.
>>>
>>>
>>> On Wed, Nov 4, 2015 at 11:29 AM Adrienne Walker <[email protected]>
>>> wrote:
>>>
>>>> Re: interoperability tests.  Looking at the guidelines for the web
>>>> platform tests, I think if all of our layout tests had been written with
>>>> the guidelines of short and minimal in mind, we would be in a much better
>>>> situation than we are today.
>>>>
>>>> My biggest concerns about layout tests are: (1) they get used to test
>>>> very internal parts of Blink, which then get exposed via awful plumbing via
>>>> internals or dumping the state of the world into a text file.  (2) they
>>>> "over test", especially in terms of pixel and text output, which adds a
>>>> rebaselining burden as these things churn over time and over different
>>>> platforms, even though that doesn't affect what the test was trying to
>>>> verify.  (3) they rely on so much of the stack that they end up being 
>>>> flaky.
>>>>
>>>> I think there's plenty of room for html / javascript tests that are
>>>> cross-platform and test web platform features that don't fall into these
>>>> traps.
>>>>
>>>> 2015-11-04 9:32 GMT-08:00 Rick Byers <[email protected]>:
>>>>
>>>>> I agree with this big picture direction - i.e. unit tests are
>>>>> generally better than integration tests.
>>>>>
>>>>> However I HAVE seen examples in input where we've missed issues
>>>>> because we had only unit tests (content, cc and blink layout tests), not
>>>>> good integration test that included the effect of threaded input handling.
>>>>> One case in particular was pretty embarrassing - IIRC we broke scrolling
>>>>> completely in most real-world cases yet not a single test failed :-).
>>>>> Still, I think this is, at best, an argument for having a couple sanity
>>>>> integration tests (eg. as content browser tests).
>>>>>
>>>>> My bigger concern here is around interoperability testing.  IMHO the
>>>>> only way we're going to substantially reduce developer pain around
>>>>> interoperability is by sharing more tests with the other rendering
>>>>> engines.  Mozilla is much better at this
>>>>> <https://wiki.mozilla.org/Auto-tools/Projects/web-platform-tests>
>>>>> than us.  This, by its nature, is impossible to do with unit tests.  So
>>>>> when new simple web platform features are being added (especially highly
>>>>> deterministic JS APIs), I've recently started encouraging my team to try
>>>>> focusing on web-platform-tests <http://testthewebforward.org>.
>>>>> Obviously more complex / non-deterministic features should still prefer to
>>>>> focus on unit tests.  But the exact balance here seems very non-obvious to
>>>>> me.  But like with everything, we should probably just try things, 
>>>>> evaluate
>>>>> how we're doing, and iterate.  Any concerns?
>>>>>
>>>>> Rick
>>>>>
>>>>>
>>>>> On Wed, Nov 4, 2015 at 11:39 AM, Ojan Vafai <[email protected]> wrote:
>>>>>
>>>>>> Thread hijacking this to blink-dev@chromium.
>>>>>>
>>>>>> At a high level, I want to +1 that we should be moving our testing
>>>>>> away from high level regression tests into more deterministic tests.
>>>>>> There's a good (not quite finished) alternative to layout tests that I
>>>>>> think we should pursue (an evolution of Elliott's C++-based Sim 
>>>>>> framework).
>>>>>>
>>>>>> I'll be adding this to the Blink infra team's long-term roadmap
>>>>>> shortly. Coincidentally, Elliott and I hashed out a basic idea yesterday.
>>>>>> At some point we'll turn that into a doc and a proposal, but mostly it
>>>>>> needs people to do the engineering work to make it possible. So, if 
>>>>>> you're
>>>>>> possibly interested in 5%-80%ing on this, ping me.
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 4:04 PM 'Adrienne Walker' via paint-dev <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> (I sent this previously as an email reply to a few people about a
>>>>>>> particular patch, but as this continues to come up on a monthly basis, 
>>>>>>> here
>>>>>>> is a slightly edited copy of this rant as a reference of my biased 
>>>>>>> position
>>>>>>> on this topic.)
>>>>>>>
>>>>>>> We should *not* make layout tests use threaded compositing.
>>>>>>>
>>>>>>>
>>>>>>> We shouldn't do it because it's a rat hole.  jamesr and I both tried
>>>>>>> to make this happen on separate occasions.  We spent a lot of time and
>>>>>>> fought with even more flakiness.  I'm sure somebody could heroically 
>>>>>>> burn a
>>>>>>> lot of time making this happen and succeed where we couldn't, but in my
>>>>>>> opinion, it's a waste of time.  Threaded compositing increases
>>>>>>> flakiness and loses determinism in terms of when events happen, how
>>>>>>> repaints line up with frames, and how animation happens, especially.  We
>>>>>>> already have a ton of problems with flakiness in layout tests, so why 
>>>>>>> make
>>>>>>> this worse?
>>>>>>>
>>>>>>> If there are features that are only enabled with threaded
>>>>>>> compositing (compositor animations?) that somehow still need Blink
>>>>>>> layout tests, we should enable those features for non-threaded
>>>>>>> compositing.  I would support removing the virtual/threaded directory
>>>>>>> if somebody made it possible to test those features in other ways.  If
>>>>>>> there are tests that have different behavior in threaded mode vs not
>>>>>>> threaded mode, then they need to be deflaked or tested more narrowly via
>>>>>>> unit tests or esprehn's simulation testing framework.
>>>>>>>
>>>>>>> Single vs threaded compositing is generally well-tested within cc
>>>>>>> unit tests.  There should be very little difference at the Blink level, 
>>>>>>> and
>>>>>>> I would question the need to test it there.  We have ironed out a small
>>>>>>> number of message ordering differences over time in cc.  We should 
>>>>>>> consider
>>>>>>> these differences bugs and add cc unit tests for them to guarantee the 
>>>>>>> same
>>>>>>> ordering of messages.  In the long term, I think the Blimp/Mandoline 
>>>>>>> work
>>>>>>> has the potential to merge single/thread proxy with an abstracted
>>>>>>> glue layer between them.
>>>>>>>
>>>>>>> So, sure, we should test what we ship, but on the other hand every
>>>>>>> test should not be an integration test.  I have have seen exactly zero 
>>>>>>> bugs
>>>>>>> ever that would have been caught if a layout test had been using 
>>>>>>> threaded
>>>>>>> compositing instead of single threaded compositing.  Really, zero.
>>>>>>>
>>>>>>> Personally, I think our problems as a team lie more on the "tests
>>>>>>> are flaky and hard to reason about because they're integration tests" 
>>>>>>> end
>>>>>>> of the spectrum and far less on the "we are shipping huge hidden
>>>>>>> regressions because of holes in our testing".
>>>>>>>
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 5:17 PM Adrienne Walker <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> 2015-11-03 16:30 GMT-08:00 Alexey <[email protected]>:
>>>>>>>
>>>>>>>> Can we run all of the virtual/threaded tests in SingleThreadProxy
>>>>>>>> <https://code.google.com/p/chromium/codesearch#chromium/src/cc/trees/single_thread_proxy.cc&l=37&gs=cpp:cc::class-SingleThreadProxy::SingleThreadProxy(cc::LayerTreeHost%2520*,%2520cc::LayerTreeHostSingleThreadClient%2520*,%2520scoped_refptr%253Cbase::SingleThreadTaskRunner%253E,%2520scoped_ptr%253Ccc::BeginFrameSource,%2520base::DefaultDeleter%253Ccc::BeginFrameSource%253E%2520%253E)@chromium/../../cc/trees/single_thread_proxy.cc%257Cdef&gsn=SingleThreadProxy&ct=xref_usages>
>>>>>>>>  mode?
>>>>>>>>
>>>>>>>
>>>>>>> Currently, no!
>>>>>>>
>>>>>>> There are some things (e.g. composited animations) that have
>>>>>>> behavior differences in code depending on threaded vs single thread.  It
>>>>>>> seems to me that these are wrinkles that could be ironed out.  It seems
>>>>>>> like there's no reason we couldn't create and run composited animations 
>>>>>>> in
>>>>>>> single-threaded mode.
>>>>>>>
>>>>>>> It looks like other things have been added there over time (e.g.
>>>>>>> idle callbacks), that I don't understand as well.  Naively, it seems 
>>>>>>> like
>>>>>>> fundamentally nothing should depend on threaded compositing.
>>>>>>>
>>>>>>
>>>>>
>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "blink-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org
>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHOQ7J-GWQnOx-v%2BFft8chTimcXkvJ6E2Lg%3DQtAJG6G8eaM-3A%40mail.gmail.com.

Re: [blink-dev] Re: Layout tests should not use threaded compositing, part XIII

Reply via email to