kfaraz commented on code in PR #18515:
URL: https://github.com/apache/druid/pull/18515#discussion_r2366682138


##########
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java:
##########
@@ -1106,41 +1168,47 @@ public ResultSet getIndexInfo(DatabaseMetaData 
databaseMetaData, String tableNam
   }
 
   /**
-   * create index on the table with retry if not already exist, to be called 
after createTable
+   * Create index on the table with retry if not already exist, to be called 
after createTable
+   * Format of index name is either specified via legacy {@param 
indexTemplate} or {@code generateSHABasedIndexIdentifier}.
    *
    * @param tableName       Name of the table to create index on
-   * @param indexName       case-insensitive string index name, it helps to 
check the existing index on table
+   * @param indexTemplate   Template to create index ID (nullable for forwards 
compatibility with SHA-based indices)
    * @param indexCols       List of columns to be indexed on
    * @param createdIndexSet
    */
   public void createIndex(
       final String tableName,
-      final String indexName,
+      final String indexTemplate,
       final List<String> indexCols,
       final Set<String> createdIndexSet
   )
   {
+    final String shaIndexName = 
StringUtils.toUpperCase(generateSHABasedIndexIdentifier(tableName, indexCols));
+    final String legacyIndexName = indexTemplate != null ? 
StringUtils.toUpperCase(StringUtils.format(
+        indexTemplate,
+        tableName
+    )) : null;
+
+    // Avoid creating duplicate indices if an index with either naming 
convention already exists
+    if (legacyIndexName != null && createdIndexSet.contains(legacyIndexName)) {
+      log.info("Legacy index[%s] on Table[%s] already exists", 
legacyIndexName, tableName);
+      return;
+    } else if (createdIndexSet.contains(shaIndexName)) {
+      log.info("Index[%s] on Table[%s] already exists", shaIndexName, 
tableName);
+      return;
+    }
     try {
       retryWithHandle(
-          new HandleCallback<Void>()
-          {
-            @Override
-            public Void withHandle(Handle handle)
-            {
-              if 
(!createdIndexSet.contains(StringUtils.toUpperCase(indexName))) {
-                String indexSQL = StringUtils.format(
-                    "CREATE INDEX %1$s ON %2$s(%3$s)",
-                    indexName,
-                    tableName,
-                    Joiner.on(",").join(indexCols)
-                );
-                log.info("Creating Index on Table [%s], sql: [%s] ", 
tableName, indexSQL);
-                handle.execute(indexSQL);
-              } else {
-                log.info("Index [%s] on Table [%s] already exists", indexName, 
tableName);
-              }
-              return null;
-            }
+          (HandleCallback<Void>) handle -> {

Review Comment:
   Thanks for checking.



##########
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java:
##########
@@ -1206,4 +1274,27 @@ public static boolean isStatementException(Throwable e)
     return e instanceof StatementException ||
            (e instanceof CallbackFailedException && e.getCause() instanceof 
StatementException);
   }
+
+  /**
+   * Used to ensure full index coverage in the presence of large index names.
+   * Returned indentifiers are of the format: `idx_{tableName}_{SHA of column 
list}`.
+   *
+   * @param tableName the table name
+   * @param columns   the set of columns to create the index on 
(case-insensitive)
+   * @return unique index identifier
+   */
+  @VisibleForTesting
+  protected String generateSHABasedIndexIdentifier(String tableName, 
List<String> columns)
+  {
+    final String prefix = "idx_" + tableName + "_";
+    final String columnDigest = DigestUtils.sha1Hex(
+        columns.stream()
+               .map(StringUtils::toLowerCase)
+               .collect(Collectors.joining("_"))
+    );
+    return prefix + columnDigest.substring(

Review Comment:
   Yeah, I was concerned about the uniqueness of the hash and thus wanted to 
always include the full length.
   But you make a fair point and even a truncated hash should be good enough.
   
   So, I would suggest _always_ using a fixed size substring of the hash (say 
10 or 12 characters), similar to what we do for task IDs.
   This ensures that the size of the hash is fixed across all the index names 
and they all
   adhere to the same format.
   
   The current logic might reduce the size of the hash string to zero (or even 
throw an exception)
   if the table name is too long.
   To handle that, if the `idx_<table-name>_<12-char-hash>` is still longer 
than the limit, we just raise an alert.(Failing service startup might be too 
drastic since we currently don't fail on this scenario.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to