rangareddy commented on issue #18161:
URL: https://github.com/apache/hudi/issues/18161#issuecomment-4067762284
**1. Why is numBuckets not changing in .hashing_meta after clustering?**
This is actually expected behavior for the Consistent Hashing Index. Unlike
the Simple Bucket Index (where numBuckets is a fixed number), Consistent
Hashing uses a "split/merge" mechanism.
In the .**hashing_meta** file, the **numBuckets** field usually represents
the **initial count.** The actual growth is tracked via the **nodes** list in
that same metadata file. After clustering, you should see more file groups
(parquet files) in the partition directory, and the metadata file will have
more "node" entries mapped to these new files, even if the top-level numBuckets
counter hasn't updated.
```json
cat
/tmp/orders_consistent_hashing/.hoodie/.bucket_index/consistent_hashing_metadata/2026-03-16/*.hashing_meta
{
"version" : 0,
"partitionPath" : "2026-03-16",
"instant" : "00000000000000",
"numBuckets" : 4,
"seqNo" : 0,
"nodes" : [ {
"value" : 536870912,
"fileIdPrefix" : "dd2c2ec3-1178-354c-a5b9-2e77f7d82816",
"tag" : "NORMAL"
}, {
"value" : 1073741824,
"fileIdPrefix" : "b82b9a65-bf3a-382e-adf7-b03bc62528e0",
"tag" : "NORMAL"
}, {
"value" : 1610612736,
"fileIdPrefix" : "7cfeed2e-30ee-3b40-b46e-bb6a1e7aa7b9",
"tag" : "NORMAL"
}, {
"value" : 2147483647,
"fileIdPrefix" : "3376d73d-1489-3906-ac40-4d416dbc448b",
"tag" : "NORMAL"
} ]
}
{
"version" : 0,
"partitionPath" : "2026-03-16",
"instant" : "20260316190533321",
"numBuckets" : 4,
"seqNo" : 1,
"nodes" : [ {
"value" : 268435455,
"fileIdPrefix" : "6cc35e1d-ae94-48f3-b8b8-c6f40f145a94",
"tag" : "NORMAL"
}, {
"value" : 536870912,
"fileIdPrefix" : "a8e3c81e-1ab3-49f5-9956-035b3b78502d",
"tag" : "NORMAL"
}, {
"value" : 805306368,
"fileIdPrefix" : "fb4ec00a-ff38-4eaa-8fe4-88448119ebd1",
"tag" : "NORMAL"
}, {
"value" : 1073741824,
"fileIdPrefix" : "f7ad1863-0608-429a-86e7-ff5b50f75d36",
"tag" : "NORMAL"
}, {
"value" : 1342177280,
"fileIdPrefix" : "c72c4a65-ccf7-4920-90da-597a2c5453f2",
"tag" : "NORMAL"
}, {
"value" : 1610612736,
"fileIdPrefix" : "259fbd22-7c1b-49cd-ad07-73d822d5b6c6",
"tag" : "NORMAL"
}, {
"value" : 1879048191,
"fileIdPrefix" : "0e6f19a1-b89e-45ed-99bd-87871f3d3ade",
"tag" : "NORMAL"
}, {
"value" : 2147483647,
"fileIdPrefix" : "4e0dbc1e-61d6-4090-ad75-03109ce2f03b",
"tag" : "NORMAL"
} ]
}{
"version" : 0,
"partitionPath" : "2026-03-16",
"instant" : "20260316190537187",
"numBuckets" : 4,
"seqNo" : 2,
"nodes" : [ {
"value" : 134217727,
"fileIdPrefix" : "6db865cc-3785-4035-8dd1-f779e99ff2b3",
"tag" : "NORMAL"
}, {
"value" : 268435455,
"fileIdPrefix" : "2062ceea-87d9-493a-ae1c-38784369cab5",
"tag" : "NORMAL"
}, {
"value" : 402653183,
"fileIdPrefix" : "d2994956-3382-4fed-b2c2-04afd5d2672b",
"tag" : "NORMAL"
}, {
"value" : 536870912,
"fileIdPrefix" : "01ebbe3e-284d-49b2-a1ac-565843ca0875",
"tag" : "NORMAL"
}, {
"value" : 671088640,
"fileIdPrefix" : "30e4aa2b-1986-4a80-93c4-aa31e78cb998",
"tag" : "NORMAL"
}, {
"value" : 805306368,
"fileIdPrefix" : "641d36a6-4c7f-47a4-9c7a-ca4e905cee5e",
"tag" : "NORMAL"
}, {
"value" : 939524096,
"fileIdPrefix" : "a554bf6c-5c83-49d3-8c4e-e8c053ee13ab",
"tag" : "NORMAL"
}, {
"value" : 1073741824,
"fileIdPrefix" : "5b6dc1c0-3de1-41d6-b79e-0ed545a2183f",
"tag" : "NORMAL"
}, {
"value" : 1207959552,
"fileIdPrefix" : "987cf218-dd24-4e2a-a6a6-bb93495c7cb9",
"tag" : "NORMAL"
}, {
"value" : 1342177280,
"fileIdPrefix" : "75c710b3-5a30-41b4-b418-3e727ca5f50a",
"tag" : "NORMAL"
}, {
"value" : 1476395008,
"fileIdPrefix" : "380e0201-a2de-4531-a266-b821e8cb3d0c",
"tag" : "NORMAL"
}, {
"value" : 1610612736,
"fileIdPrefix" : "5848fdd4-4da8-49c7-997f-572feaad6b79",
"tag" : "NORMAL"
}, {
"value" : 1744830463,
"fileIdPrefix" : "f5653c14-dfaa-4902-bb17-48471c47730a",
"tag" : "NORMAL"
}, {
"value" : 1879048191,
"fileIdPrefix" : "27d6e0b1-399e-4ab7-a205-a2a1b1f98dc4",
"tag" : "NORMAL"
}, {
"value" : 2013265919,
"fileIdPrefix" : "d7ce786a-d17d-4cac-a9ee-0afb7cdd4b03",
"tag" : "NORMAL"
}, {
"value" : 2147483647,
"fileIdPrefix" : "360d1fb7-8c8f-480c-a510-647e8434f2b4",
"tag" : "NORMAL"
} ]
}
```
**2. Is Consistent Hashing supported for Non-Partitioned tables?**
Currently, Consistent Hashing is heavily optimized for partitioned tables.
For NON_PARTITIONED tables, Hudi treats the entire table as a single "default"
partition.
```sh
% ls -la
/tmp/orders_consistent_hashing/.hoodie/.bucket_index/consistent_hashing_metadata
| grep -v .crc
total 64
drwxr-xr-x@ 12 rangareddy wheel 384 16 Mar 19:08 .
drwxr-xr-x@ 3 rangareddy wheel 96 16 Mar 19:08 ..
-rw-r--r--@ 1 rangareddy wheel 0 16 Mar 19:08 00000000000000.commit
-rw-r--r--@ 1 rangareddy wheel 585 16 Mar 19:08
00000000000000.hashing_meta
-rw-r--r--@ 1 rangareddy wheel 0 16 Mar 19:08 20260316190802805.commit
-rw-r--r--@ 1 rangareddy wheel 1046 16 Mar 19:08
20260316190802805.hashing_meta
-rw-r--r--@ 1 rangareddy wheel 1962 16 Mar 19:08
20260316190806037.hashing_meta
```
While it should theoretically work, many users report NullPointerExceptions
or IndexOutOfBounds because the index look-up logic expects a partition path to
locate the .hashing_meta file. For non-partitioned data, it is currently safer
to use the Simple Bucket Index.
**3. Can we use a CustomRecordMerger with Inline Clustering?**
Yes, but in Spark, you must ensure the class is registered correctly in the
Spark Conf and that the JAR containing your custom merger is available on both
the driver and executors.
The error "Stage cancelled" usually happens because the
HoodieSparkRecordMerger cannot be serialized or the executor fails to
instantiate your custom class. Ensure you set
hoodie.datasource.write.record.merger.impls.
**4. Should Metadata Table be disabled for Consistent Hashing?**
No, it is not mandatory to disable the Metadata Table. However, in earlier
versions (0.13 - 1.0), there were synchronization bugs between Consistent
Hashing and Metadata Column Stats. In Hudi 1.1.0+, they are designed to work
together, but if you see inconsistencies in file listing, disabling it
(hoodie.metadata.enable=false) is a common troubleshooting step.
```sh
% ls -la /tmp/orders_consistent_hashing/.hoodie/metadata
total 0
drwxr-xr-x@ 5 rangareddy wheel 160 16 Mar 19:08 .
drwxr-xr-x@ 11 rangareddy wheel 352 16 Mar 19:08 ..
drwxr-xr-x@ 9 rangareddy wheel 288 16 Mar 19:08 .hoodie
drwxr-xr-x@ 24 rangareddy wheel 768 16 Mar 19:08 column_stats
drwxr-xr-x@ 16 rangareddy wheel 512 16 Mar 19:08 files
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]