prashantwason opened a new issue, #18691:
URL: https://github.com/apache/hudi/issues/18691

   ## Bug Description
   
   **What happened:**
   `CREATE INDEX IF NOT EXISTS record_index ON <table> (<record_key_col>)` 
throws
   `HoodieMetadataIndexException: Index already exists: record_index` when the 
index
   already exists. The `IF NOT EXISTS` clause has no effect.
   
   **What you expected:**
   With `IF NOT EXISTS`, the command should be a no-op when the index already
   exists — matching standard SQL semantics and matching the behavior implied by
   the `ignoreIfExists: Boolean` field on `CreateIndexCommand`.
   
   **Steps to reproduce:**
   1. Create a Hudi COW table with a record-key column (e.g. `uuid`).
   2. `CREATE INDEX record_index ON tbl (uuid)` — succeeds.
   3. `CREATE INDEX IF NOT EXISTS record_index ON tbl (uuid)` — throws
      `HoodieMetadataIndexException: Index already exists: record_index`.
   
   ## Root cause
   
   `CreateIndexCommand` in
   
`hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scala`
   parses an `ignoreIfExists: Boolean` field from the SQL but never propagates 
it
   to `HoodieSparkIndexClient.create(...)`:
   
   ```scala
   } else if 
(indexName.equals(HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX)) {
     ValidationUtils.checkArgument(...)
     new HoodieSparkIndexClient(sparkSession).create(metaClient, indexName,
         HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX, columnsMap,
         options.asJava, table.properties.asJava)
     // ignoreIfExists is dropped here
   }
   ```
   
   `HoodieSparkIndexClient.create` in
   
`hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/HoodieSparkIndexClient.java`
   has no `ignoreIfExists` parameter at all, and `createRecordIndex` 
unconditionally
   throws when the index exists:
   
   ```java
   String fullIndexName = PARTITION_NAME_RECORD_INDEX;
   if (indexExists(metaClient, fullIndexName)) {
     throw new HoodieMetadataIndexException("Index already exists: " + 
userIndexName);
   }
   ```
   
   The same gap exists for the column_stats/bloom_filters/secondary-index 
branches —
   none of the `HoodieSparkIndexClient(...).create(...)` call sites in
   `CreateIndexCommand.run` pass through `ignoreIfExists`.
   
   ## Suggested fix
   
   1. Add an `ignoreIfExists: boolean` parameter to 
`HoodieSparkIndexClient.create(...)`.
   2. Pass it through from every branch in `CreateIndexCommand.run`.
   3. In `createRecordIndex` and `createExpressionOrSecondaryIndex`, return 
early
      (instead of throwing) when the index already exists and `ignoreIfExists 
== true`.
   
   ## Notes
   
   - The expression/secondary-index path (`createExpressionOrSecondaryIndex`,
     `HoodieSparkIndexClient.java:155-159`) already silently skips 
re-registration
     when the index exists, so observable behavior between record_index and
     expression/secondary indexes already differs. The fix is a good opportunity
     to unify behavior across both paths.
   - `DROP INDEX IF EXISTS` works correctly today: the `ignoreIfNotExists: 
Boolean`
     field on `DropIndexCommand` IS propagated to
     `HoodieSparkIndexClient.drop(metaClient, indexName, ignoreIfNotExists)`.
     `CREATE INDEX IF NOT EXISTS` is the missing symmetric path.
   
   ## Environment
   
   **Hudi version:** 1.x (verified on 1.2; affects all releases where 
record_index DDL exists)
   **Query engine:** Spark 3.3
   **Relevant configs:** standard MDT-enabled COW table


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to