kfaraz commented on code in PR #18515:
URL: https://github.com/apache/druid/pull/18515#discussion_r2366682138
##########
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java:
##########
@@ -1106,41 +1168,47 @@ public ResultSet getIndexInfo(DatabaseMetaData
databaseMetaData, String tableNam
}
/**
- * create index on the table with retry if not already exist, to be called
after createTable
+ * Create index on the table with retry if not already exist, to be called
after createTable
+ * Format of index name is either specified via legacy {@param
indexTemplate} or {@code generateSHABasedIndexIdentifier}.
*
* @param tableName Name of the table to create index on
- * @param indexName case-insensitive string index name, it helps to
check the existing index on table
+ * @param indexTemplate Template to create index ID (nullable for forwards
compatibility with SHA-based indices)
* @param indexCols List of columns to be indexed on
* @param createdIndexSet
*/
public void createIndex(
final String tableName,
- final String indexName,
+ final String indexTemplate,
final List<String> indexCols,
final Set<String> createdIndexSet
)
{
+ final String shaIndexName =
StringUtils.toUpperCase(generateSHABasedIndexIdentifier(tableName, indexCols));
+ final String legacyIndexName = indexTemplate != null ?
StringUtils.toUpperCase(StringUtils.format(
+ indexTemplate,
+ tableName
+ )) : null;
+
+ // Avoid creating duplicate indices if an index with either naming
convention already exists
+ if (legacyIndexName != null && createdIndexSet.contains(legacyIndexName)) {
+ log.info("Legacy index[%s] on Table[%s] already exists",
legacyIndexName, tableName);
+ return;
+ } else if (createdIndexSet.contains(shaIndexName)) {
+ log.info("Index[%s] on Table[%s] already exists", shaIndexName,
tableName);
+ return;
+ }
try {
retryWithHandle(
- new HandleCallback<Void>()
- {
- @Override
- public Void withHandle(Handle handle)
- {
- if
(!createdIndexSet.contains(StringUtils.toUpperCase(indexName))) {
- String indexSQL = StringUtils.format(
- "CREATE INDEX %1$s ON %2$s(%3$s)",
- indexName,
- tableName,
- Joiner.on(",").join(indexCols)
- );
- log.info("Creating Index on Table [%s], sql: [%s] ",
tableName, indexSQL);
- handle.execute(indexSQL);
- } else {
- log.info("Index [%s] on Table [%s] already exists", indexName,
tableName);
- }
- return null;
- }
+ (HandleCallback<Void>) handle -> {
Review Comment:
Thanks for checking.
##########
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java:
##########
@@ -1206,4 +1274,27 @@ public static boolean isStatementException(Throwable e)
return e instanceof StatementException ||
(e instanceof CallbackFailedException && e.getCause() instanceof
StatementException);
}
+
+ /**
+ * Used to ensure full index coverage in the presence of large index names.
+ * Returned indentifiers are of the format: `idx_{tableName}_{SHA of column
list}`.
+ *
+ * @param tableName the table name
+ * @param columns the set of columns to create the index on
(case-insensitive)
+ * @return unique index identifier
+ */
+ @VisibleForTesting
+ protected String generateSHABasedIndexIdentifier(String tableName,
List<String> columns)
+ {
+ final String prefix = "idx_" + tableName + "_";
+ final String columnDigest = DigestUtils.sha1Hex(
+ columns.stream()
+ .map(StringUtils::toLowerCase)
+ .collect(Collectors.joining("_"))
+ );
+ return prefix + columnDigest.substring(
Review Comment:
Yeah, I was concerned about the uniqueness of the hash and thus wanted to
always include the full length.
But you make a fair point and even a truncated hash should be good enough.
So, I would suggest _always_ using a fixed size substring of the hash (say
10 or 12 characters), similar to what we do for task IDs.
This ensures that the size of the hash is fixed across all the index names
and they all
adhere to the same format.
The current logic might reduce the size of the hash string to zero (or even
throw an exception)
if the table name is too long.
To handle that, if the `idx_<table-name>_<12-char-hash>` is still longer
than the limit, we just raise an alert.(Failing service startup might be too
drastic since we currently don't fail on this scenario.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]