suxiaogang223 opened a new pull request, #64867:
URL: https://github.com/apache/doris/pull/64867

   ### What problem does this PR solve?
   
   Hive external partition cache generated partition IDs by hashing catalog, 
database, table, and partition name into a long. Different Hive partition names 
can collide after truncating the hash to a long, causing `HashBiMap` to throw 
`value already present` while loading partition values.
   
   This change uses cache-local monotonically increasing partition IDs for Hive 
partition values and preserves the next ID across incremental cache copies. 
Incremental partition adds allocate the next ID in O(1) without scanning 
existing partitions. `Util.genIdByName` remains in use for external database, 
table, and file cache identities, but is no longer used for Hive partition 
names.
   
   ### Release note
   
   Fix Hive external partition cache failures caused by partition ID hash 
collisions.
   
   ### Check List (For Author)
   
   - Test: Unit Test
       - `./run-fe-ut.sh --run 
org.apache.doris.datasource.hive.HiveMetaStoreCacheTest`
       - `MAVEN_OPTS="-Xmx8g -XX:MaxMetaspaceSize=2g" ./build.sh --fe`
   - Behavior changed: Yes. Hive external partition cache now uses local 
sequential partition IDs instead of hash-derived partition IDs.
   - Does this need documentation: No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to