DanielCarter-stack commented on PR #10474:
URL: https://github.com/apache/seatunnel/pull/10474#issuecomment-3873172117

   <!-- code-pr-reviewer -->
   <!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10474", "part": 1, 
"total": 1} -->
   ### Issue 1: Missing method documentation comments
   
   **Location**: 
`seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-mongodb-e2e/src/test/java/mongodb/MongodbCDCIT.java:670`
   
   ```java
   private void cleanMongoSourceTable() {
       mongodbContainer.executeCommandFileInDatabase("inventoryClean", 
MONGODB_DATABASE);
   }
   ```
   
   **Related context**:
   - Caller: `testMongodbCdcMultiTaskConcurrentSubmission` (line 296)
   - Caller: `cleanSourceTable()` (line 675)
   - Related method: `cleanSourceTable()` (lines 674-678)
   
   **Problem description**:
   The newly added `cleanMongoSourceTable()` method lacks JavaDoc comments. 
Although the code intent is relatively clear, missing documentation causes the 
following problems for other maintainers:
   1. Unclear why a separate "MongoDB-only clear" method is needed
   2. Unclear when to use `cleanMongoSourceTable()` vs `cleanSourceTable()` in 
different scenarios
   3. Difficult to understand the background and root cause of this fix
   
   **Potential risks**:
   - Risk 1: Other developers may misuse these two methods, using 
`cleanSourceTable()` in concurrent tests causing tests to become unstable again
   - Risk 2: Code reviewers may not understand the necessity of this 
modification, considering it unnecessary refactoring
   
   **Impact scope**:
   - Direct impact: Maintainability of `MongodbCDCIT` class
   - Indirect impact: Future modifications and extensions to MongoDB CDC tests
   - Affected area: Single E2E test module
   
   **Severity**: MINOR
   
   **Improvement suggestions**:
   ```java
   /**
    * Cleans only the MongoDB source collections without touching the MySQL 
sink tables.
    * This is specifically used in concurrent CDC tests to avoid race 
conditions where:
    * <ul>
    *   <li>TRUNCATE on MySQL happens immediately</li>
    *   <li>CDC stream continues processing and may replay/upsert data to the 
emptied table</li>
    *   <li>Result: source is empty but sink is partially filled (e.g., 2/5 
records)</li>
    * </ul>
    * 
    * <p>By only cleaning MongoDB source, the CDC stream naturally stops 
producing new events,
    * allowing the test to verify final consistency without interference from 
replay operations.
    * 
    * <p>Use {@link #cleanSourceTable()} when you need to clean both source and 
sink.
    * 
    * @see #cleanSourceTable()
    * @see #testMongodbCdcMultiTaskConcurrentSubmission()
    */
   private void cleanMongoSourceTable() {
       mongodbContainer.executeCommandFileInDatabase("inventoryClean", 
MONGODB_DATABASE);
   }
   ```
   
   **Rationale**:
   - Explains why this method is needed (avoid race conditions)
   - Describes CDC's replay/upsert mechanism
   - Clarifies usage scenarios
   - Provides cross-references to related methods
   - Helps future maintainers understand the background of this fix
   
   ---
   
   ### Issue 2: CI trigger conditions may be too broad
   
   **Location**: `.github/workflows/backend.yml:765`
   
   ```yaml
   all-connectors-it-1:
     needs: [ changes, sanity-check ]
     if: needs.changes.outputs.api == 'true' || needs.changes.outputs.engine == 
'true' || contains(needs.changes.outputs.it-modules, 
'connector-cdc-mongodb-e2e')
   ```
   
   **Related context**:
   - Modified job: `all-connectors-it-1` (lines 763-792)
   - Related jobs: `all-connectors-it-2` to `all-connectors-it-7` (lines 
794-978)
   - Change detection logic: `changes` job (lines 120-297)
   
   **Problem description**:
   The PR adds special handling for the `connector-cdc-mongodb-e2e` module in 
`all-connectors-it-1`, but this presents several problems:
   
   1. **Inconsistency**: Other MongoDB CDC-related test jobs (such as 
`oracle-cdc-connector-it`, `kudu-connector-it`, etc.) use similar patterns, but 
this modification only targets `all-connectors-it-1`
   
   2. **Incomplete coverage**: If the `connector-cdc-mongodb-e2e` module is 
modified, all related connector test jobs should be triggered, not just 
`all-connectors-it-1`
   
   3. **Naming confusion**: The naming of `all-connectors-it-1` indicates it 
should run part of all connector tests, but this condition makes it also run 
when MongoDB CDC changes, which may not align with naming semantics
   
   **Potential risks**:
   - Risk 1: When only `connector-cdc-mongodb-e2e` is modified, not all 
necessary tests may be triggered
   - Risk 2: `all-connectors-it-1` may run unnecessary tests, wasting CI 
resources
   - Risk 3: This pattern may need similar modifications in other places, but 
may be overlooked
   
   **Impact scope**:
   - Direct impact: GitHub Actions CI/CD workflow
   - Indirect impact: Completeness of PR checks and CI runtime
   - Affected area: CI/CD infrastructure
   
   **Severity**: MINOR
   
   **Improvement suggestions**:
   Consider one of the following two approaches:
   
   **Option A: Create dedicated MongoDB CDC E2E job**
   ```yaml
   mongodb-cdc-connector-it:
     needs: [ changes, sanity-check ]
     if: needs.changes.outputs.api == 'true' || needs.changes.outputs.engine == 
'true' || contains(needs.changes.outputs.it-modules, 
'connector-cdc-mongodb-e2e')
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
         java: [ '8', '11' ]
         os: [ 'ubuntu-latest' ]
     timeout-minutes: 180
     steps:
       # ... Steps similar to all-connectors-it, but only runs 
connector-cdc-mongodb-e2e
   ```
   
   **Option B: Add conditions in multiple all-connectors-it jobs**
   ```yaml
   # Add the same condition to all-connectors-it-1, 2, ..., 7
   if: needs.changes.outputs.api == 'true' || needs.changes.outputs.engine == 
'true' || contains(needs.changes.outputs.it-modules, 
'connector-cdc-mongodb-e2e')
   ```
   
   **Rationale**:
   - Option A is clearer, with a dedicated job responsible for MongoDB CDC 
testing
   - Option B ensures all connector test jobs run when MongoDB CDC changes
   - Current modification may not be complete enough, requires CI/CD maintainer 
review
   
   **Note**: This requires deep understanding of the project's CI/CD 
architecture and strategy. Recommend discussing with CI/CD maintainers.
   
   ---
   
   ### Issue 3: Fix may be incomplete - other test methods may also be affected
   
   **Location**: 
`seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-mongodb-e2e/src/test/java/mongodb/MongodbCDCIT.java:183,
 200, 208, 226`
   
   ```java
   @TestTemplate
   public void testMongodbCdcToMysqlCheckDataE2e(TestContainer container)
           throws InterruptedException {
       cleanSourceTable();  // Line 183
       // ...
       cleanSourceTable();  // Line 200 - Called after running CDC task
       TimeUnit.SECONDS.sleep(20);
       assertionsSourceAndSink(MONGODB_COLLECTION_1, SINK_SQL_PRODUCTS);
   }
   
   @TestTemplate
   public void testMongodbCdcMultiTableToMysqlE2e(TestContainer container)
           throws InterruptedException {
       cleanSourceTable();  // Line 208
       // ...
       cleanSourceTable();  // Line 226 - Called after running CDC task
       TimeUnit.SECONDS.sleep(20);
       assertionsSourceAndSink(MONGODB_COLLECTION_1, SINK_SQL_PRODUCTS);
       assertionsSourceAndSink(MONGODB_COLLECTION_2, SINK_SQL_ORDERS);
   }
   ```
   
   **Related context**:
   - `testMongodbCdcMultiTaskConcurrentSubmission()` (lines 255-300) - Fixed
   - `testMongodbCdcToMysqlCheckDataE2e()` (lines 180-203) - Not fixed
   - `testMongodbCdcMultiTableToMysqlE2e()` (lines 205-252) - Not fixed
   - `testSavepointRecovery()` (lines 346-380) - Not fixed
   - `testResumeTokenFailureRecovery()` (lines 388-438) - Not fixed
   
   **Problem description**:
   The PR only fixed the issue in 
`testMongodbCdcMultiTaskConcurrentSubmission()`, but similar patterns exist in 
other test methods:
   
   1. **`testMongodbCdcToMysqlCheckDataE2e()`** (line 200):
      - Calls `cleanSourceTable()` after CDC task runs
      - Then waits 20 seconds and verifies data consistency
      - This scenario is similar to concurrent testing and may also be affected 
by the same race condition
   
   2. **`testMongodbCdcMultiTableToMysqlE2e()`** (line 226):
      - Calls `cleanSourceTable()` after CDC task runs
      - Verifies data consistency of two collections
      - Similarly may be affected by race conditions
   
   **Potential risks**:
   - Risk 1: These test methods may also become unstable due to the same race 
condition
   - Risk 2: If these tests start failing, maintainers may not understand the 
root cause
   - Risk 3: Incomplete fix causes the problem to reappear in other places
   
   **Impact scope**:
   - Direct impact: Stability of other MongoDB CDC E2E tests
   - Indirect impact: CI/CD reliability
   - Affected area: Multiple E2E test methods
   
   **Severity**: MAJOR
   
   **Improvement suggestions**:
   
   **Option A: Uniformly fix all similar scenarios**
   ```java
   @TestTemplate
   public void testMongodbCdcToMysqlCheckDataE2e(TestContainer container)
           throws InterruptedException {
       cleanSourceTable();
       CompletableFuture.supplyAsync(
               () -> {
                   try {
                       container.executeJob("/mongodbcdc_to_mysql.conf");
                   } catch (Exception e) {
                       log.error("Commit task exception :" + e.getMessage());
                       throw new RuntimeException();
                   }
                   return null;
               });
       TimeUnit.SECONDS.sleep(10);
       // insert update delete
       upsertDeleteSourceTable();
       TimeUnit.SECONDS.sleep(20);
       assertionsSourceAndSink(MONGODB_COLLECTION_1, SINK_SQL_PRODUCTS);
   
       // Fix: Only clear MongoDB after running CDC task
       cleanMongoSourceTable();  // Change to cleanMongoSourceTable()
       TimeUnit.SECONDS.sleep(20);
       assertionsSourceAndSink(MONGODB_COLLECTION_1, SINK_SQL_PRODUCTS);
   }
   ```
   
   **Option B: Analyze each test scenario, selectively fix**
   Need to analyze the expected behavior of each test:
   - If the test expects to clear source and target then verify, keep 
`cleanSourceTable()`
   - If the test expects to clear source and let CDC stop naturally, change to 
`cleanMongoSourceTable()`
   
   **Option C: Add comments explaining why not modifying**
   If certain tests truly need to clear both source and target, comments should 
be added explaining why:
   ```java
   // This test intentionally cleans both source and sink to verify CDC can 
recover
   // from a complete reset scenario.
   cleanSourceTable();
   ```
   
   **Rationale**:
   - Ensures all test methods follow a consistent pattern
   - Avoids future instability issues
   - Improves test suite reliability
   - Helps maintainers understand why `cleanSourceTable()` is used in some 
places while `cleanMongoSourceTable()` is used in others
   
   **Suggested actions**:
   1. Review all locations using `cleanSourceTable()`
   2. Analyze the expected behavior of each scenario
   3. Uniformly fix the pattern or add clear comments
   4. Run the complete test suite to verify modifications
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to