Re: [PR] feat: clustered segments pt.1 (druid)

via GitHub Wed, 03 Jun 2026 16:40:13 -0700


clintropolis commented on code in PR #19460:
URL: https://github.com/apache/druid/pull/19460#discussion_r3352544087



##########
processing/src/main/java/org/apache/druid/segment/projections/Projections.java:
##########
@@ -530,6 +547,243 @@ public static String 
getProjectionSegmentInternalFilePrefix(ProjectionSchema pro
     return projectionSchema.getName() + "/";
   }
 
+  /**
+   * Check whether {@code type} is an allowed cluster group clustering-column 
type. Clustering is restricted to the
+   * primitive scalar types: {@link ValueType#STRING}, {@link ValueType#LONG}, 
{@link ValueType#DOUBLE},
+   * {@link ValueType#FLOAT}. Complex and array types are rejected.
+   */
+  public static boolean isAllowedClusteringType(@Nullable ColumnType type)
+  {
+    return type != null && type.anyOf(ValueType.STRING, ValueType.LONG, 
ValueType.DOUBLE, ValueType.FLOAT);
+  }
+
+  /**
+   * Segment internal file prefix + column for a cluster group's per-group 
column data:
+   * {@code __base$<id0>_<id1>...<idK>/<column>}
+   */
+  public static String getClusterGroupSegmentInternalFileName(List<Integer> 
clusteringValueIds, String column)
+  {
+    return getClusterGroupSegmentInternalFilePrefix(clusteringValueIds) + 
column;
+  }
+
+  public static String getClusterGroupSegmentInternalFilePrefix(List<Integer> 
clusteringValueIds)
+  {
+    if (clusteringValueIds == null || clusteringValueIds.isEmpty()) {
+      throw DruidException.defensive("clusteringValueIds must not be null or 
empty");
+    }
+    final StringBuilder sb = new StringBuilder(CLUSTER_GROUP_PREFIX);
+    for (int i = 0; i < clusteringValueIds.size(); i++) {
+      if (i > 0) {
+        sb.append('_');
+      }
+      sb.append(clusteringValueIds.get(i));
+    }
+    sb.append('/');
+    return sb.toString();
+  }
+
+  /**
+   * Build the per-query {@link ClusterGroupQueryPlan} for {@code groups} 
against a {@link CursorBuildSpec}. Walks the
+   * filter tree once per group via {@link #walkClusterGroupFilter}, folding 
clustering-column leaves to
+   * {@link TrueFilter} / {@link FalseFilter} against each group's constant 
clustering tuple and propagating those
+   * constants through AND / OR / NOT. Non-clustering filters remain in place 
so the per-group cursor evaluates them
+   * as expected. Query-VC-equivalent-to-clustering-VC resolution happens 
per-leaf via {@link #resolveClusteringIndex}.
+   * <p/>
+   * Output shape per group encodes the truth value: top-level {@link 
FalseFilter} = provably FALSE (group is
+   * pruned from {@link ClusterGroupQueryPlan#survivingGroups()}), top-level 
{@link TrueFilter} = provably TRUE
+   * (no residual filter needed at the cursor), anything else = UNKNOWN 
(residual filter passed to the per-group
+   * cursor). The walker's result is stashed on the plan so {@link 
ClusterGroupQueryPlan#rewriteFor} hands it back
+   * directly without re-walking.
+   */
+  public static ClusterGroupQueryPlan planClusterGroupQuery(
+      List<TableClusterGroupSpec> groups,
+      CursorBuildSpec cursorBuildSpec
+  )
+  {
+    final Filter queryFilter = cursorBuildSpec.getFilter();
+    final VirtualColumns queryVcs = cursorBuildSpec.getVirtualColumns();
+    if (groups.isEmpty() || queryFilter == null) {
+      // No filter (or no groups): every group survives, per-group rewrite is 
a no-op (null filter).
+      return new ClusterGroupQueryPlan(groups, group -> null);
+    }
+
+    // Every spec in the list shares one summary by construction (set once in 
the schema constructor), so
+    // clusteringColumns + groupVcs are loop-invariant, only the per-group 
clustering tuple changes.
+    final ClusteredValueGroupsBaseTableSchema summary = 
groups.getFirst().getSummary();
+    final RowSignature clusteringColumns = summary.getClusteringColumns();
+    final VirtualColumns groupVcs = summary.getVirtualColumns();
+
+    // Single walk per group: produces the rewritten filter, and a top-level 
FalseFilter means the group prunes.
+    // Cache the rewrite for every group (including pruned ones, where it's 
FalseFilter) so rewriteFor doesn't
+    // re-walk for either the cursor factory or callers that want to inspect a 
pruned group's outcome directly.
+    final List<TableClusterGroupSpec> kept = new ArrayList<>(groups.size());
+    final IdentityHashMap<TableClusterGroupSpec, Filter> rewriteCache = new 
IdentityHashMap<>();

Review Comment:
   to me it generally seems nicer to have a result object than a map since it 
gives us a bit more flexibility and is easier to document stuff about the 
result.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: clustered segments pt.1 (druid)

Reply via email to