[GitHub] [beam] lostluck commented on a diff in pull request #23589: [Go SDK]: SingleFlight bundle descriptor requests

GitBox Wed, 12 Oct 2022 12:04:07 -0700


lostluck commented on code in PR #23589:
URL: https://github.com/apache/beam/pull/23589#discussion_r993805413



##########
sdks/go/pkg/beam/core/runtime/harness/harness.go:
##########
@@ -328,14 +329,20 @@ func (c *control) getOrCreatePlan(bdID 
bundleDescriptorID) (*exec.Plan, error) {
        desc, ok := c.descriptors[bdID]
        c.mu.Unlock() // Unlock to make the lookup or build the descriptor.
        if !ok {
-               newDesc, err := c.lookupDesc(bdID)
+               newDesc, err, _ := c.bundleGetGroup.Do(string(bdID), func() 
(interface{}, error) {

Review Comment:
   I'm currently on "leave it as implemented in the PR", since it's got 1 fewer 
lock cycles, and doesn't require a full re-examination of the lock sections in 
the harness. That work is happening as part of the umbrella issue anyway.
   
   There are a few steps here that each thread goes through: 
   1. Look if I have plans I can re-use.  
     1.a If a spare plan exists, return that instance.
   2. Get the descriptor. 
     2.a Look up descriptor from runner, and cache it.
   3. Build a plan instance.
   
   We don't want 1 in the single inflight, since threads shouldn't share active 
plans. 1 must be in a critical section.
   We have 2a in single inflight, but not 2 which we keep in the lock's 
critical section from 1.
   3 must not be in a critical section.
   
   I could move 2 into the single-flight, but then singleflight would still 
need to re-acquire the lock. This does avoid races with reading the map from 
multiple threads though.
   However, I say it's about the same:
   A) Currently, the read misses, and the thread joins the single flight, and 
then builds it's own Plan.
   B) Moving 2 into single flight means losing the lock for a moment, before 
singleflight grabs the lock again to check, and then removes the lock.
   
   B) ends up with 1 extra lock acquisition for N threads. In both we acquire 
the lock N times for the plan lookups, and then +1 for caching the descriptor. 
In B) we also need to acquire the lock once for looking up the descriptor.
   
   ----
   Backing up, we want to get or compute a plan instance. But I think we'd 
still end up with the same set of lock acquisitions to serialize with the other 
accesses of the maps and plans.
   The only different thing is the sync.Map vs manual locking. For that change 
I'd rather make a benchmark and measure the two approaches. We'd also need to 
examine how having "getting plans and descriptors" maps with the lock critical 
sections, which is slowly happening in course of this work.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] lostluck commented on a diff in pull request #23589: [Go SDK]: SingleFlight bundle descriptor requests

Reply via email to