prashant8530 opened a new issue, #6705:
URL: https://github.com/apache/paimon/issues/6705

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   1.20.0
   
   ### Compute Engine
   
   Flink 1.20.0
   
   ### Minimal reproduce step
   
   1. Set up Flink CDC pipeline with MySQL source and Paimon sink
   2. Source table has primary key: client_id                                   
                                                                                
                                                                                
                                                                               
   3. Configure Paimon sink with partition key: client_id                       
                                                                                
                                                                                
                                                                               
   4. Start the pipeline                                                        
                                                                                
                                                                                
                                                                               
   5. Pipeline fails when creating table with error:                            
                                                                                
                                                                                
                                                                               
      "Primary key constraint [client_id] should not be same with partition 
fields [client_id]"  
   
   ### What doesn't meet your expectations?
   
   
   PaimonMetadataApplier.applyCreateTable() (lines 199-203) automatically adds 
   partition keys to primary keys without checking if this violates Paimon's 
own                                                                             
                                                                                
                                                                                
  
   constraint.                                                                  
                                                                                
                                                                                
                                                                               
                                                                                
                                                                                
                                                                                
                                                                               
   Expected: Should allow partitioning by primary key, or at minimum not modify 
                                                                                
                                                                                
                                                                               
   user's schema in a way that causes validation failure.                       
                                                                                
                                                                                
                                                                               
                                                                                
                                                                                
                                                                                
                                                                               
   Actual: Code forces partition keys into primary keys, then Paimon's          
                                                                                
                                                                                
                                                                               
   TableSchema.trimmedPrimaryKeys() rejects it because trimmed primary keys     
                                                                                
                                                                                
                                                                               
   would be empty.                                                              
                                                                                
                                                                                
                                                                               
                                                                                
                                                                                
                                                                                
                                                                               
   This makes it impossible to partition by the same field as primary key in    
                                                                                
                                                                                
                                                                               
   CDC scenarios.  
   
   ### Anything else?
   
   
   Root cause code in PaimonMetadataApplier.java lines 199-203:
                                                                                
                                                                                
                                                                                
                                                                               
   for (String partitionColumn : partitionKeys) {                               
                                                                                
                                                                                
                                                                               
       if (!primaryKeys.contains(partitionColumn)) {                            
                                                                                
                                                                                
                                                                               
           primaryKeys.add(partitionColumn);                                    
                                                                                
                                                                                
                                                                               
       }                                                                        
                                                                                
                                                                                
                                                                               
   }                                                                            
                                                                                
                                                                                
                                                                               
                                                                                
                                                                                
                                                                                
                                                                               
   This logic incorrectly assumes partition keys must always be in primary 
keys.                                                                           
                                                                                
                                                                                
    
                                                                                
                                                                                
                                                                                
                                                                               
   Issues:                                                                      
                                                                                
                                                                                
                                                                               
   1. Breaks when partition key equals primary key (current bug)                
                                                                                
                                                                                
                                                                               
   2. Prevents append-only tables (no primary key) from having partitions       
                                                                                
                                                                                
                                                                               
   3. Modifies user's explicit schema definition without consent                
                                                                                
                                                                                
                                                                               
                                                                                
                                                                                
                                                                                
                                                                               
   Suggested fix:                                                               
                                                                                
                                                                                
                                                                               
   - Remove automatic modification, OR                                          
                                                                                
                                                                                
                                                                               
   - Only apply for bucketed tables with composite primary keys, OR             
                                                                                
                                                                                
                                                                               
   - Skip this logic when primaryKeys is empty (append-only tables)             
                                                                                
                                                                                
                                                                               
                                                                                
                                                                                
                                                                                
                                                                               
   Append-only tables SHOULD support partitioning without requiring primary 
keys.                                                                           
                                                                                
                                                                                
   
                                                                                
                                                                                
                                                                                
                                                                               
   Workaround: Currently must use different fields for partition and primary 
key,                                                                            
                                                                                
                                                                                
or avoid partitioning in append-only tables.    
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to