Abacn opened a new issue, #27022: URL: https://github.com/apache/beam/issues/27022
### What happened? Reported from https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/759 When implementing a load test for BigTableIO, we encountered the following: - load tests up to 200mb pass stably. - after 5 million records, not all data gets into BigTable, but the pipeline logs indicate that all data was written. Dataflow write pipeline logs say that 10M records were written. However, the read job shows only 1.6M records read. Using the cbt utility, the cbt -instance <instance id> count <table id> command found out that BigTableIO write did not work correctly. Despite the fact that the logs say that all 10M records were written, in fact, there were exactly as many in the table as the read pipeline processed (1.6M). Some of the records processed by the write pipeline did not get into the table. - Dataflow write pipeline logs - `2023-06-05_03_51_23-9051905355392445711` - Dataflow read pipeline logs - `2023-06-05_03_58_18-7016807525741705033` project: apache-beam-testing ### Issue Priority Priority: 1 (data loss / total loss of function) ### Issue Components - [ ] Component: Python SDK - [X] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [X] Component: IO connector - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
