xiearthur opened a new issue, #13447:
URL: https://github.com/apache/hudi/issues/13447

   ### **Describe the problem you faced**
   We are trying to use CONSISTENT_HASHING bucket index with COW 
(Copy-on-Write) tables but encountering runtime failures. The current 
implementation appears to only support MOR tables, which limits our 
architecture choices for workloads that prefer COW semantics.
   
   ### **To Reproduce**
   Steps to reproduce the behavior:
   1. Create a COW table with bucket index configuration
   2. Set `hoodie.index.bucket.engine=CONSISTENT_HASHING`
   3. Attempt to perform insert/upsert operations
   4. Observe runtime failure with HoodieUpsertException
   
   **Configuration used:**
   ```properties
   hoodie.table.type=COPY_ON_WRITE
   hoodie.index.type=BUCKET
   hoodie.index.bucket.engine=CONSISTENT_HASHING
   hoodie.bucket.index.num.buckets=4
   ```
   
   ### **Expected behavior**
   COW tables should support CONSISTENT_HASHING bucket index similar to MOR 
tables, allowing for:
   - Dynamic bucket resizing based on data volume
   - Better data distribution compared to simple bucket index
   - Consistent write performance across varying data sizes
   
   ### **Environment Description**
   * **Hudi version**: 0.14.0+ 
   * **flink version**: 1.13
   * **Storage**: S3/HDFS
   * **Running on Docker**: No
   
   ### **Additional context**
   
   **Business Impact:**
   - Prevents optimal indexing strategy for COW-based workloads
   - Forces choice between table type preference and indexing capabilities
   - Simple bucket index doesn't scale well with varying data volumes
   
   ```
   
   **Questions for the community:**
   1. Are there plans to support CONSISTENT_HASHING for COW tables?
   2. What are the technical barriers preventing this support?
   3. Would the community be open to contributions implementing this feature?
   4. Are there alternative indexing strategies that provide similar benefits 
for COW tables?
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to