cshuo commented on code in PR #18821:
URL: https://github.com/apache/hudi/pull/18821#discussion_r3295693454


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RocksDBIndexBackend.java:
##########
@@ -37,18 +38,21 @@
 @Slf4j
 public class RocksDBIndexBackend implements GlobalIndexBackend {
   private static final String COLUMN_FAMILY = "index_cache";
+  private static final String BASE_PATH = "hudi-index-backend";
 
   private final RocksDBDAO rocksDBDAO;
+  private final String rocksDbBasePath;
   private transient FlinkRocksDBIndexMetrics rocksDBIndexMetrics;
 
   public RocksDBIndexBackend(String rocksDbBasePath, boolean 
isPartitionedTable) {
+    this.rocksDbBasePath = rocksDbBasePath;
     // Register custom serializer for HoodieRecordGlobalLocation to minimize 
storage overhead
     ConcurrentHashMap<String, CustomSerializer<?>> serializers = new 
ConcurrentHashMap<>();
     serializers.put(COLUMN_FAMILY, isPartitionedTable
         ? new CodedRecordGlobalLocationSerializer()
         : new RecordGlobalLocationSerializer());
 
-    this.rocksDBDAO = new RocksDBDAO("hudi-index-backend", rocksDbBasePath, 
serializers, true);
+    this.rocksDBDAO = RocksDBDAOFactory.getOrCreate(BASE_PATH, 
rocksDbBasePath, serializers, true);

Review Comment:
   Sharing the DAO also shares the serializers created by the first backend. 
This is risky because `RecordGlobalLocationSerializer` is explicitly not 
thread-safe and reuses mutable input/output buffers, while 
`CodedRecordGlobalLocationSerializer` mutates a plain `HashMap`/`ArrayList`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to