rangareddy commented on issue #18161:
URL: https://github.com/apache/hudi/issues/18161#issuecomment-4067762284

   **1. Why is numBuckets not changing in .hashing_meta after clustering?**
   
   This is actually expected behavior for the Consistent Hashing Index. Unlike 
the Simple Bucket Index (where numBuckets is a fixed number), Consistent 
Hashing uses a "split/merge" mechanism.
   In the .**hashing_meta** file, the **numBuckets** field usually represents 
the **initial count.** The actual growth is tracked via the **nodes** list in 
that same metadata file. After clustering, you should see more file groups 
(parquet files) in the partition directory, and the metadata file will have 
more "node" entries mapped to these new files, even if the top-level numBuckets 
counter hasn't updated.
   
   ```json
   cat 
/tmp/orders_consistent_hashing/.hoodie/.bucket_index/consistent_hashing_metadata/2026-03-16/*.hashing_meta
   {
     "version" : 0,
     "partitionPath" : "2026-03-16",
     "instant" : "00000000000000",
     "numBuckets" : 4,
     "seqNo" : 0,
     "nodes" : [ {
       "value" : 536870912,
       "fileIdPrefix" : "dd2c2ec3-1178-354c-a5b9-2e77f7d82816",
       "tag" : "NORMAL"
     }, {
       "value" : 1073741824,
       "fileIdPrefix" : "b82b9a65-bf3a-382e-adf7-b03bc62528e0",
       "tag" : "NORMAL"
     }, {
       "value" : 1610612736,
       "fileIdPrefix" : "7cfeed2e-30ee-3b40-b46e-bb6a1e7aa7b9",
       "tag" : "NORMAL"
     }, {
       "value" : 2147483647,
       "fileIdPrefix" : "3376d73d-1489-3906-ac40-4d416dbc448b",
       "tag" : "NORMAL"
     } ]
   }
   {
     "version" : 0,
     "partitionPath" : "2026-03-16",
     "instant" : "20260316190533321",
     "numBuckets" : 4,
     "seqNo" : 1,
     "nodes" : [ {
       "value" : 268435455,
       "fileIdPrefix" : "6cc35e1d-ae94-48f3-b8b8-c6f40f145a94",
       "tag" : "NORMAL"
     }, {
       "value" : 536870912,
       "fileIdPrefix" : "a8e3c81e-1ab3-49f5-9956-035b3b78502d",
       "tag" : "NORMAL"
     }, {
       "value" : 805306368,
       "fileIdPrefix" : "fb4ec00a-ff38-4eaa-8fe4-88448119ebd1",
       "tag" : "NORMAL"
     }, {
       "value" : 1073741824,
       "fileIdPrefix" : "f7ad1863-0608-429a-86e7-ff5b50f75d36",
       "tag" : "NORMAL"
     }, {
       "value" : 1342177280,
       "fileIdPrefix" : "c72c4a65-ccf7-4920-90da-597a2c5453f2",
       "tag" : "NORMAL"
     }, {
       "value" : 1610612736,
       "fileIdPrefix" : "259fbd22-7c1b-49cd-ad07-73d822d5b6c6",
       "tag" : "NORMAL"
     }, {
       "value" : 1879048191,
       "fileIdPrefix" : "0e6f19a1-b89e-45ed-99bd-87871f3d3ade",
       "tag" : "NORMAL"
     }, {
       "value" : 2147483647,
       "fileIdPrefix" : "4e0dbc1e-61d6-4090-ad75-03109ce2f03b",
       "tag" : "NORMAL"
     } ]
   }{
     "version" : 0,
     "partitionPath" : "2026-03-16",
     "instant" : "20260316190537187",
     "numBuckets" : 4,
     "seqNo" : 2,
     "nodes" : [ {
       "value" : 134217727,
       "fileIdPrefix" : "6db865cc-3785-4035-8dd1-f779e99ff2b3",
       "tag" : "NORMAL"
     }, {
       "value" : 268435455,
       "fileIdPrefix" : "2062ceea-87d9-493a-ae1c-38784369cab5",
       "tag" : "NORMAL"
     }, {
       "value" : 402653183,
       "fileIdPrefix" : "d2994956-3382-4fed-b2c2-04afd5d2672b",
       "tag" : "NORMAL"
     }, {
       "value" : 536870912,
       "fileIdPrefix" : "01ebbe3e-284d-49b2-a1ac-565843ca0875",
       "tag" : "NORMAL"
     }, {
       "value" : 671088640,
       "fileIdPrefix" : "30e4aa2b-1986-4a80-93c4-aa31e78cb998",
       "tag" : "NORMAL"
     }, {
       "value" : 805306368,
       "fileIdPrefix" : "641d36a6-4c7f-47a4-9c7a-ca4e905cee5e",
       "tag" : "NORMAL"
     }, {
       "value" : 939524096,
       "fileIdPrefix" : "a554bf6c-5c83-49d3-8c4e-e8c053ee13ab",
       "tag" : "NORMAL"
     }, {
       "value" : 1073741824,
       "fileIdPrefix" : "5b6dc1c0-3de1-41d6-b79e-0ed545a2183f",
       "tag" : "NORMAL"
     }, {
       "value" : 1207959552,
       "fileIdPrefix" : "987cf218-dd24-4e2a-a6a6-bb93495c7cb9",
       "tag" : "NORMAL"
     }, {
       "value" : 1342177280,
       "fileIdPrefix" : "75c710b3-5a30-41b4-b418-3e727ca5f50a",
       "tag" : "NORMAL"
     }, {
       "value" : 1476395008,
       "fileIdPrefix" : "380e0201-a2de-4531-a266-b821e8cb3d0c",
       "tag" : "NORMAL"
     }, {
       "value" : 1610612736,
       "fileIdPrefix" : "5848fdd4-4da8-49c7-997f-572feaad6b79",
       "tag" : "NORMAL"
     }, {
       "value" : 1744830463,
       "fileIdPrefix" : "f5653c14-dfaa-4902-bb17-48471c47730a",
       "tag" : "NORMAL"
     }, {
       "value" : 1879048191,
       "fileIdPrefix" : "27d6e0b1-399e-4ab7-a205-a2a1b1f98dc4",
       "tag" : "NORMAL"
     }, {
       "value" : 2013265919,
       "fileIdPrefix" : "d7ce786a-d17d-4cac-a9ee-0afb7cdd4b03",
       "tag" : "NORMAL"
     }, {
       "value" : 2147483647,
       "fileIdPrefix" : "360d1fb7-8c8f-480c-a510-647e8434f2b4",
       "tag" : "NORMAL"
     } ]
   }
   ```
   
   **2. Is Consistent Hashing supported for Non-Partitioned tables?**
   
   Currently, Consistent Hashing is heavily optimized for partitioned tables. 
For NON_PARTITIONED tables, Hudi treats the entire table as a single "default" 
partition.
   
   ```sh
   % ls -la 
/tmp/orders_consistent_hashing/.hoodie/.bucket_index/consistent_hashing_metadata
 | grep -v .crc
   total 64
   drwxr-xr-x@ 12 rangareddy  wheel   384 16 Mar 19:08 .
   drwxr-xr-x@  3 rangareddy  wheel    96 16 Mar 19:08 ..
   -rw-r--r--@  1 rangareddy  wheel     0 16 Mar 19:08 00000000000000.commit
   -rw-r--r--@  1 rangareddy  wheel   585 16 Mar 19:08 
00000000000000.hashing_meta
   -rw-r--r--@  1 rangareddy  wheel     0 16 Mar 19:08 20260316190802805.commit
   -rw-r--r--@  1 rangareddy  wheel  1046 16 Mar 19:08 
20260316190802805.hashing_meta
   -rw-r--r--@  1 rangareddy  wheel  1962 16 Mar 19:08 
20260316190806037.hashing_meta
   ```
   
   While it should theoretically work, many users report NullPointerExceptions 
or IndexOutOfBounds because the index look-up logic expects a partition path to 
locate the .hashing_meta file. For non-partitioned data, it is currently safer 
to use the Simple Bucket Index.
   
   **3. Can we use a CustomRecordMerger with Inline Clustering?**
   
   Yes, but in Spark, you must ensure the class is registered correctly in the 
Spark Conf and that the JAR containing your custom merger is available on both 
the driver and executors.
   
   The error "Stage cancelled" usually happens because the 
HoodieSparkRecordMerger cannot be serialized or the executor fails to 
instantiate your custom class. Ensure you set 
hoodie.datasource.write.record.merger.impls.
   
   **4. Should Metadata Table be disabled for Consistent Hashing?**
   
   No, it is not mandatory to disable the Metadata Table. However, in earlier 
versions (0.13 - 1.0), there were synchronization bugs between Consistent 
Hashing and Metadata Column Stats. In Hudi 1.1.0+, they are designed to work 
together, but if you see inconsistencies in file listing, disabling it 
(hoodie.metadata.enable=false) is a common troubleshooting step.
   
   ```sh
   % ls -la /tmp/orders_consistent_hashing/.hoodie/metadata 
   total 0
   drwxr-xr-x@  5 rangareddy  wheel  160 16 Mar 19:08 .
   drwxr-xr-x@ 11 rangareddy  wheel  352 16 Mar 19:08 ..
   drwxr-xr-x@  9 rangareddy  wheel  288 16 Mar 19:08 .hoodie
   drwxr-xr-x@ 24 rangareddy  wheel  768 16 Mar 19:08 column_stats
   drwxr-xr-x@ 16 rangareddy  wheel  512 16 Mar 19:08 files
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to