VardhanThigle opened a new issue, #29832: URL: https://github.com/apache/beam/issues/29832
### What happened? # Race Condition detected in `SpannerTransactionWriterDoFn.` between the setup and teardown phases leading to missing records during spanner migration. ## Root cause : The root cause is `spanner` object being closed immediately after create due to a race condition in [SpannerAccessor](third_party/java_src/apache_beam4g/srcs/sdks/io/google_cloud_platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerAccessor.java) class. ### Race Condition: 1. `SpannerAccessor.getOrCreate` get's called during the `@setup` of the [SpannerTransactionWriterDoFn](https://source.corp.google.com/piper///depot/google3/third_party/java_src/cloud/teleport/v2/datastream-to-spanner/src/main/java/com/google/cloud/teleport/v2/templates/SpannerTransactionWriterDoFn.java;l=176?q=spanner%20close&ss=piper%2FGoogle%2FPiper:google3%2Fthird_party%2Fjava_src%2Fcloud%2Fteleport%2Fv2%2F) and `SpannerAccessor.close` get's called during it's `@teardown` 2. These functions use a combination of synchronous access and atomic Reference counts to share the spanner connections for a given spanner config. (In case of SMT we have a single Spanner config and hence a single connection) 3. When `SpannerAccessor.getOrCreate` provisions a new connection: 3.1. Queries the Accessor from a concurrent hashmap 3.2. If not found, it takes the lock, provisions the connection. 3.3. sets the refcount to 0. 3.4. releases the lock. 3.5. increments the refcount outside the lock. 4. When `SpannerAccessor.close` destroys a connection: 4.1. atomically decrements the refcount. 4.2 if the refcount is 0. 4.3. it, takes the lock. 4.4 Rechecks the refcount. 4.5. destroys a connection iff the refcount during the second check is <= 0. 5. If the logs show anything other than a single connect followed by a single close, it's an indication of race condition (since SMT has only one spanner connection profile) ### Issue Priority Priority: 1 (data loss / total loss of function) ### Issue Components - [ ] Component: Python SDK - [X] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [ ] Component: IO connector - [ ] Component: Beam YAML - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
