davidzollo commented on PR #10799:
URL: https://github.com/apache/seatunnel/pull/10799#issuecomment-4650446711

   Hi @vsantonastaso, I found that the CI failures are caused by this PR.
   
   The key failure is:
   
   ```text
   Caused by: org.apache.flink.util.FlinkRuntimeException:
   Coordinator Provider for node Source: Postgres-CDC-Source is not 
serializable.
   
   Caused by: java.io.NotSerializableException:
   io.debezium.relational.TableId
   
   ```
   
   
   I did a quick root-cause verification. SeaTunnel maintains a patched 
`TableId` under:
   
   ```text
   connector-cdc-base/src/main/java/io/debezium/relational/TableId.java
   ```
   
   
   This patched class implements `Serializable`:
   
   
   ```java
   private static final long serialVersionUID = 1L;
   ```
   
   
   However, the original `io.debezium.relational.TableId` from Debezium 1.9.8, 
which comes from `debezium-core`, does **not** implement `Serializable`.
   
   The failure chain is:
   
   Step | Status
   -- | --
   1 | SeaTunnel has a patched io.debezium.relational.TableId in 
connector-cdc-base, and this patched version implements Serializable. This PR 
did not modify that Java source file.
   2 | The original Debezium 1.9.8 TableId from debezium-core does not 
implement Serializable.
   3 | On upstream dev, the CDC connector POMs exclude debezium-core and 
debezium-api from debezium-connector-* dependencies. This ensures that the 
patched SeaTunnel TableId is the one used at runtime.
   4 | This PR removes those <exclusions> and also adds debezium-embedded with 
compile scope.
   5 | debezium-embedded brings back debezium-core transitively, so the 
original non-serializable Debezium TableId re-enters the classpath and can 
override the patched one.
   6 | Flink then tries to serialize the coordinator provider for the CDC 
source, hits a TableId field, and fails with NotSerializableException.
   
   
   The important difference is in the connector POMs.
   
   On upstream `dev`, for example in `connector-cdc-mysql/pom.xml`, the 
Debezium connector dependency keeps these exclusions:
   
   ```xml
   
   <dependency>
       <groupId>io.debezium</groupId>
       <artifactId>debezium-connector-mysql</artifactId>
       <scope>compile</scope>
       <exclusions>
           <exclusion>
               <groupId>io.debezium</groupId>
               <artifactId>debezium-core</artifactId>
           </exclusion>
           <exclusion>
               <groupId>io.debezium</groupId>
               <artifactId>debezium-api</artifactId>
           </exclusion>
       </exclusions>
   
   </dependency>
   ```
   
   
   In this PR, these two exclusions were removed. The same change appears in 
the MySQL, PostgreSQL, Oracle, SQL Server, and MongoDB CDC connector POMs.
   
   That changes the runtime classpath, even though the CDC runtime Java source 
code itself may look unchanged.
   
   This also explains the CI impact:
   
   * CDC-related E2E jobs fail, including PostgreSQL CDC, MySQL CDC, Oracle 
CDC, SQL Server CDC, and MongoDB CDC.
   
   * Other jobs such as Doris or Paimon may appear affected because they are 
part of the same reactor build or are cancelled after upstream failures, but 
the actual root cause is in the CDC dependency/classpath change.
   
   
   The recommended fix is to restore the original exclusions in the CDC 
connector POMs.
   
   For each `debezium-connector-*` dependency, we should keep excluding 
`debezium-core` and `debezium-api`, for example:
   
   ```xml
   
   <dependency>
       <groupId>io.debezium</groupId>
       <artifactId>debezium-connector-xxx</artifactId>
       <version>${debezium.version}</version>
       <scope>compile</scope>
       <exclusions>
           <exclusion>
               <groupId>io.debezium</groupId>
               <artifactId>debezium-core</artifactId>
           </exclusion>
           <exclusion>
               <groupId>io.debezium</groupId>
               <artifactId>debezium-api</artifactId>
           </exclusion>
       </exclusions>
   </dependency>
   
   ```
   
   
   This should be applied consistently to the CDC connector POMs for MySQL, 
PostgreSQL, Oracle, SQL Server, and MongoDB.
   
   
   With these exclusions restored, the patched SeaTunnel `TableId` should again 
be the effective runtime class, and the Flink coordinator serialization failure 
should be resolved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to