gianm commented on code in PR #17774:
URL: https://github.com/apache/druid/pull/17774#discussion_r2053382100


##########
processing/src/main/java/org/apache/druid/segment/SegmentReference.java:
##########
@@ -19,12 +19,31 @@
 
 package org.apache.druid.segment;
 
+import org.apache.druid.query.policy.NoopPolicyEnforcer;
+import org.apache.druid.query.policy.PolicyEnforcer;
+
 /**
- * A {@link Segment} with a associated references, such as {@link 
ReferenceCountingSegment} where the reference is
+ * A {@link Segment} with an associated references, such as {@link 
ReferenceCountingSegment} where the reference is
  * the segment itself, and {@link 
org.apache.druid.segment.join.HashJoinSegment} which wraps a
  * {@link ReferenceCountingSegment} and also includes the associated list of
  * {@link org.apache.druid.segment.join.JoinableClause}
  */
 public interface SegmentReference extends Segment, ReferenceCountedObject
 {
+
+  /**
+   * Validates if the segment complies with the policy restrictions on tables.
+   * <p>
+   * This should be called right before the segment is about to be processed 
by the query stack, and after
+   * {@link 
org.apache.druid.query.planning.ExecutionVertex#createSegmentMapFunction(PolicyEnforcer)}.
+   */
+  default void validateOrElseThrow(PolicyEnforcer policyEnforcer)

Review Comment:
   The changes to `BroadcastJoinSegmentMapFnProcessor` look good to me.
   
   The implementation of `PolicyEnforcer#validateOrElseThrow` for 
`ReferenceCountingSegment` seems a little sketchy still, for a couple reasons:
   
   - The "else skip validation" seems like a hole for stuff to fall into. Prior 
to the latest fix, an inner `ReferenceCountingSegment` did fall into it. I'm 
wondering if other things could potentially fall in, like extension segment 
types, new segment types, etc.
   - The branch doing `instanceof QueryableIndexSegment || instanceof 
IncrementalIndexSegment` isn't strong enough to detect whether a `Segment` 
corresponds to a regular table, because extensions can add other kinds of 
segments backing tables. This is a specific kind of thing that could fall into 
the hole mentioned previously.
   - I don't understand why the `instanceof RestrictedSegment` branch is needed 
-- is it really possible for RestrictedSegment to wrap another 
RestrictedSegment? I would think this is impossible given that 
RestrictedDataSource can only wrap a TableDataSource.
   
   I observe that the `DataSource` version of validation is cleaner and more 
robust, and can be because there are two important things `DataSource` has:
   
   - `DataSource#getChildren` exists, so datasource trees can be walked robustly
   - We know that tables correspond to `TableDataSource`, so it's always 
possible to identify whether a leaf datasource is a regular table or not.
   
   To have equally robust validation for `Segment`, I think we would need to 
add `Segment#getChildren` to address the first point. For the second point I 
think we can use a check like this for leaf segments to see if they correspond 
to regular tables: `as(PhysicalSegmentInspector.class) != null`.
   
   I tried to think for a while of a robust way to write this code without 
adding `Segment#getChildren` and wasn't able to. So I do think we should do 
that. The default implementation should throw an unsupported operation error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to