felixhzhu opened a new issue, #4257:
URL: https://github.com/apache/amoro/issues/4257

   ### What happened?
   
   Multiple independent `SnowflakeIdGenerator` instances exist within the same 
AMS JVM, each maintaining their own `sequence` and `lastTimestamp` state. When 
two instances generate IDs within the same 10ms time window, they produce 
**identical IDs**, causing `Duplicate entry for key 'PRIMARY'` errors on the 
`table_process` table.
   
   ### Root Cause
   
   Prior to the fix, there were two separate static `SnowflakeIdGenerator` 
instances:
   
   1. **`IcebergTableUtil.java`**:
      ```java
      private static final SnowflakeIdGenerator snowflakeIdGenerator = new 
SnowflakeIdGenerator();
      ```
      Used in `createOptimizingPlanner()` to generate process IDs for 
optimizing processes.
   
   2. **`TableProcessMeta.java`**:
      ```java
      private static final SnowflakeIdGenerator idGenerator = new 
SnowflakeIdGenerator();
      ```
      Used in `createProcessMeta()` to generate process IDs for maintenance 
processes (e.g., EXPIRE_SNAPSHOTS).
   
   Both instances use the default `machineId = 0`. The `sequence` counter and 
`lastTimestamp` are **instance-level fields** (not static/shared), so each 
instance independently resets `sequence = 0` upon entering a new 10ms time 
window.
   
   
   
   
   ### Affects Versions
   
   0.9.0-incubating
   
   ### What table formats are you seeing the problem on?
   
   _No response_
   
   ### What engines are you seeing the problem on?
   
   _No response_
   
   ### How to reproduce
   
   ```
   Instance A (IcebergTableUtil):
     timestamp = T, lastTimestamp ≠ T → enters else branch → sequence = 0
     Generated ID = (T << 13) | (0 << 8) | 0
   
   Instance B (TableProcessMeta):
     timestamp = T, lastTimestamp ≠ T → enters else branch → sequence = 0
     Generated ID = (T << 13) | (0 << 8) | 0
   
   Both IDs are identical → INSERT fails with PRIMARY KEY conflict
   ```
   
   Note: `synchronized` on `generateId()` locks on `this` (the respective 
instance), so the two instances do not synchronize with each other.
   
   ### Relevant log output
   
   ```shell
   ### Error updating database. Cause: 
java.sql.SQLIntegrityConstraintViolationException: 
   Duplicate entry 'XXXXXXXXX' for key 'table_process.PRIMARY'
   
   ### The error may exist in 
org/apache/amoro/server/persistence/mapper/TableProcessMapper.java (inline)
   ### The error may involve 
org.apache.amoro.server.persistence.mapper.TableProcessMapper.insertProcess-Inline
   ### The error occurred while setting parameters
   
   ### SQL: INSERT INTO table_process (process_id, table_id, ...) VALUES (?, ?, 
...)
   ```
   
   ### Anything else
   
   ### Fix
   
   Consolidate all usages to a single global `SnowflakeIdGenerator` instance:
   
   ```java
   // SnowflakeIdGenerator.java
   public static final SnowflakeIdGenerator INSTANCE = new 
SnowflakeIdGenerator();
   ```
   
   Both `IcebergTableUtil` and `TableProcessMeta` now reference 
`SnowflakeIdGenerator.INSTANCE.generateId()` instead of maintaining their own 
instances. This ensures the `synchronized` lock and `sequence` counter work 
correctly across all callers within the same JVM.
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's Code of Conduct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to