JNSimba opened a new pull request, #61267:
URL: https://github.com/apache/doris/pull/61267

   ### What problem does this PR solve?
   
   Add column-level filtering support for PostgreSQL CDC streaming jobs via the
     `table.<tableName>.exclude_columns` property. Users can specify a 
comma-separated
     list of columns to exclude from synchronization.
   
     **Syntax example:**
     ```sql
     CREATE JOB my_job
       ON STREAMING
       FROM POSTGRES (
         ...
         "table.my_table.exclude_columns" = "secret,internal_col"
       )
       TO DATABASE my_db (...)
   ```
   
   #### Changes
   
     FE (validation & table creation)
   
     - DataSourceConfigKeys: add TABLE and TABLE_EXCLUDE_COLUMNS_SUFFIX 
constants
     - DataSourceConfigValidator: recognize table.<name>.exclude_columns as a 
valid
     per-table config key (using suffix allowlist)
     - StreamingJobUtils.generateCreateTableCmds(): parse excluded columns, 
validate
     they exist in the upstream PG table and are not PK columns, then exclude 
them
     from the Doris CREATE TABLE statement
   
     cdc_client (DML filtering & schema change handling)
   
     - ConfigUtil: add parseExcludeColumns(config, tableName) utility
     - DebeziumJsonDeserializer: skip excluded fields when building 
INSERT/UPDATE/DELETE rows
     - PostgresDebeziumJsonDeserializer: skip DROP/ADD DDL for excluded columns 
during
     schema change detection, so the Doris table is never modified for columns 
it
     was never meant to have
   
   #### Behavior
   
     | Scenario                      | Behavior                                 
                  |
   
|--------------------------------|------------------------------------------------------------|
   | Snapshot / incremental DML     | Excluded column values are not written to 
Doris            |
   | PG DROP excluded column        | DDL skipped; stored schema updated; sync 
continues         |
   | PG ADD excluded column back    | DDL skipped; sync continues; Doris never 
gains the column  |
   | Exclude non-existent column    | CREATE JOB fails with clear error         
                 |
   | Exclude PK column              | CREATE JOB fails with clear error         
                 |
   
   #### Tests
   
     - test_streaming_postgres_job_col_filter.groovy: covers validation errors,
     snapshot filtering, incremental DML filtering, DROP excluded column, re-ADD
     excluded column; uses Awaitility polling instead of fixed sleeps
   
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to