clintropolis commented on code in PR #19460:
URL: https://github.com/apache/druid/pull/19460#discussion_r3352544087
##########
processing/src/main/java/org/apache/druid/segment/projections/Projections.java:
##########
@@ -530,6 +547,243 @@ public static String
getProjectionSegmentInternalFilePrefix(ProjectionSchema pro
return projectionSchema.getName() + "/";
}
+ /**
+ * Check whether {@code type} is an allowed cluster group clustering-column
type. Clustering is restricted to the
+ * primitive scalar types: {@link ValueType#STRING}, {@link ValueType#LONG},
{@link ValueType#DOUBLE},
+ * {@link ValueType#FLOAT}. Complex and array types are rejected.
+ */
+ public static boolean isAllowedClusteringType(@Nullable ColumnType type)
+ {
+ return type != null && type.anyOf(ValueType.STRING, ValueType.LONG,
ValueType.DOUBLE, ValueType.FLOAT);
+ }
+
+ /**
+ * Segment internal file prefix + column for a cluster group's per-group
column data:
+ * {@code __base$<id0>_<id1>...<idK>/<column>}
+ */
+ public static String getClusterGroupSegmentInternalFileName(List<Integer>
clusteringValueIds, String column)
+ {
+ return getClusterGroupSegmentInternalFilePrefix(clusteringValueIds) +
column;
+ }
+
+ public static String getClusterGroupSegmentInternalFilePrefix(List<Integer>
clusteringValueIds)
+ {
+ if (clusteringValueIds == null || clusteringValueIds.isEmpty()) {
+ throw DruidException.defensive("clusteringValueIds must not be null or
empty");
+ }
+ final StringBuilder sb = new StringBuilder(CLUSTER_GROUP_PREFIX);
+ for (int i = 0; i < clusteringValueIds.size(); i++) {
+ if (i > 0) {
+ sb.append('_');
+ }
+ sb.append(clusteringValueIds.get(i));
+ }
+ sb.append('/');
+ return sb.toString();
+ }
+
+ /**
+ * Build the per-query {@link ClusterGroupQueryPlan} for {@code groups}
against a {@link CursorBuildSpec}. Walks the
+ * filter tree once per group via {@link #walkClusterGroupFilter}, folding
clustering-column leaves to
+ * {@link TrueFilter} / {@link FalseFilter} against each group's constant
clustering tuple and propagating those
+ * constants through AND / OR / NOT. Non-clustering filters remain in place
so the per-group cursor evaluates them
+ * as expected. Query-VC-equivalent-to-clustering-VC resolution happens
per-leaf via {@link #resolveClusteringIndex}.
+ * <p/>
+ * Output shape per group encodes the truth value: top-level {@link
FalseFilter} = provably FALSE (group is
+ * pruned from {@link ClusterGroupQueryPlan#survivingGroups()}), top-level
{@link TrueFilter} = provably TRUE
+ * (no residual filter needed at the cursor), anything else = UNKNOWN
(residual filter passed to the per-group
+ * cursor). The walker's result is stashed on the plan so {@link
ClusterGroupQueryPlan#rewriteFor} hands it back
+ * directly without re-walking.
+ */
+ public static ClusterGroupQueryPlan planClusterGroupQuery(
+ List<TableClusterGroupSpec> groups,
+ CursorBuildSpec cursorBuildSpec
+ )
+ {
+ final Filter queryFilter = cursorBuildSpec.getFilter();
+ final VirtualColumns queryVcs = cursorBuildSpec.getVirtualColumns();
+ if (groups.isEmpty() || queryFilter == null) {
+ // No filter (or no groups): every group survives, per-group rewrite is
a no-op (null filter).
+ return new ClusterGroupQueryPlan(groups, group -> null);
+ }
+
+ // Every spec in the list shares one summary by construction (set once in
the schema constructor), so
+ // clusteringColumns + groupVcs are loop-invariant, only the per-group
clustering tuple changes.
+ final ClusteredValueGroupsBaseTableSchema summary =
groups.getFirst().getSummary();
+ final RowSignature clusteringColumns = summary.getClusteringColumns();
+ final VirtualColumns groupVcs = summary.getVirtualColumns();
+
+ // Single walk per group: produces the rewritten filter, and a top-level
FalseFilter means the group prunes.
+ // Cache the rewrite for every group (including pruned ones, where it's
FalseFilter) so rewriteFor doesn't
+ // re-walk for either the cursor factory or callers that want to inspect a
pruned group's outcome directly.
+ final List<TableClusterGroupSpec> kept = new ArrayList<>(groups.size());
+ final IdentityHashMap<TableClusterGroupSpec, Filter> rewriteCache = new
IdentityHashMap<>();
Review Comment:
to me it generally seems nicer to have a result object than a map since it
gives us a bit more flexibility and is easier to document stuff about the
result.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]