lostluck opened a new issue, #26373: URL: https://github.com/apache/beam/issues/26373
### What happened? @miracvbasaran discovered some data races in the Go SDK when running other tests inside Google. First is on the state of the plan: This should be moved to an atomic check. https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/plan.go#L41 (code is vendored inside google, but I'll provide links to the github equivalents here. Caught at these lines: Write: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/plan.go#L146 Read: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/plan.go#L280 ``` WARNING: DATA RACE Write at 0x00c00c491668 by goroutine 241: google3/third_party/golang/apache_beam/pkg/beam/core/runtime/exec/exec.(*Plan).Execute() third_party/golang/apache_beam/pkg/beam/core/runtime/exec/plan.go:146 +0xa26 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.(*control).handleInstruction() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:408 +0x1353 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.Main.func4() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:194 +0x246 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.Main.func8() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:213 +0x66 Previous read at 0x00c00c491668 by goroutine 129: google3/third_party/golang/apache_beam/pkg/beam/core/runtime/exec/exec.(*Plan).Split() third_party/golang/apache_beam/pkg/beam/core/runtime/exec/plan.go:280 +0xab google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.(*control).handleInstruction() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:571 +0x2ccd google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.Main.func4() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:194 +0x246 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.Main() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:215 +0x19f3 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/init/init.hook() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/init/init.go:122 +0x54b google3/third_party/golang/apache_beam/pkg/beam/core/runtime/runtime.Init() third_party/golang/apache_beam/pkg/beam/core/runtime/init.go:42 +0x228 google3/third_party/golang/apache_beam/pkg/beam/beam.Init() third_party/golang/apache_beam/pkg/beam/forward.go:147 +0x1c6 google3/pipeline/flume/go/runner/flume.launchSDKHarness() pipeline/flume/go/runner/launcher.go:113 +0x1c5 google3/pipeline/flume/go/runner/flume.runJob.func1() pipeline/flume/go/runner/df.go:137 +0x58 ``` In particular, this one triggers because a split request came in before the bundle meaningfully started. Not caught in open source because no runner other than Dataflow (and it's Google internal counterpart, Flume) currently splits. ---- Next is one caught for two simultaneous bundles constructing their plans, because they came in at the same time. In particular ``` WARNING: DATA RACE Read at 0x00c000628ee8 by goroutine 203: google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.(*Registry).reconcileRegistrations() third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.go:118 +0xad google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.(*Registry).ToType() third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.go:683 +0x2f google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.ToType() third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.go:61 +0xce9 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/graphx.(*CoderUnmarshaller).makeCoder() third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/coder.go:367 +0xcc9 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/graphx.(*CoderUnmarshaller).makeCoder() ... elided frames ... third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/coder.go:304 +0x25f7 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/graphx.(*CoderUnmarshaller).Coder() third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/coder.go:147 +0x12a google3/third_party/golang/apache_beam/pkg/beam/core/runtime/exec/exec.UnmarshalPlan() third_party/golang/apache_beam/pkg/beam/core/runtime/exec/translate.go:72 +0x3f2 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.(*control).getOrCreatePlan() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:342 +0x2de google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.(*control).handleInstruction() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:378 +0xadc google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.Main.func4() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:189 +0x92 ??() -:0 +0x1eab09c1 Previous write at 0x00c000628ee8 by goroutine 200: google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.(*Registry).reconcileRegistrations() third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.go:132 +0x334 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.(*Registry).ToType() third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.go:683 +0x2f google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.ToType() third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/schema/schema.go:61 +0xce9 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/graphx.(*CoderUnmarshaller).makeCoder() third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/coder.go:367 +0xcc9 google3/third_party/golang/apache_beam/pkg/beam/core/runtime/graphx/graphx.(*CoderUnmarshaller).Coder() ... elided frames ... third_party/golang/apache_beam/pkg/beam/core/runtime/exec/translate.go:383 +0x16e google3/third_party/golang/apache_beam/pkg/beam/core/runtime/exec/exec.UnmarshalPlan() third_party/golang/apache_beam/pkg/beam/core/runtime/exec/translate.go:86 +0xa3e google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.(*control).getOrCreatePlan() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:342 +0x2de google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.(*control).handleInstruction() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:378 +0xadc google3/third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.Main.func4() third_party/golang/apache_beam/pkg/beam/core/runtime/harness/harness.go:189 +0x92 ??() ``` This one is in the tension around reading the registry and writing to the registry. Seems like a case for a concurrent map around the look ups, but a RW lock is more appropriate, since these will be read more often than they're written to. I don't love the additional serialization though. Technically, it's odd that a reconciliation is triggered so late in the process, since all the coders should have been registered at Init time. ### Issue Priority Priority: 2 (default / most bugs should be filed as P2) ### Issue Components - [ ] Component: Python SDK - [ ] Component: Java SDK - [X] Component: Go SDK - [ ] Component: Typescript SDK - [ ] Component: IO connector - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
