gianm commented on code in PR #17774:
URL: https://github.com/apache/druid/pull/17774#discussion_r2046226849


##########
processing/src/main/java/org/apache/druid/segment/SegmentReference.java:
##########
@@ -19,12 +19,31 @@
 
 package org.apache.druid.segment;
 
+import org.apache.druid.query.policy.NoopPolicyEnforcer;
+import org.apache.druid.query.policy.PolicyEnforcer;
+
 /**
- * A {@link Segment} with a associated references, such as {@link 
ReferenceCountingSegment} where the reference is
+ * A {@link Segment} with an associated references, such as {@link 
ReferenceCountingSegment} where the reference is
  * the segment itself, and {@link 
org.apache.druid.segment.join.HashJoinSegment} which wraps a
  * {@link ReferenceCountingSegment} and also includes the associated list of
  * {@link org.apache.druid.segment.join.JoinableClause}
  */
 public interface SegmentReference extends Segment, ReferenceCountedObject
 {
+
+  /**
+   * Validates if the segment complies with the policy restrictions on tables.
+   * <p>
+   * This should be called right before the segment is about to be processed 
by the query stack, and after
+   * {@link 
org.apache.druid.query.planning.ExecutionVertex#createSegmentMapFunction(PolicyEnforcer)}.
+   */
+  default void validateOrElseThrow(PolicyEnforcer policyEnforcer)

Review Comment:
   > Looking at `BroadcastJoinSegmentMapFnProcessor`, it converts 
`InputNumberDataSource` to `InlineDataSource`, but the enforcer won't have 
effects on inline data.
   
   It only inlines if the input number is in the 
`inputNumberToProcessorChannelMap` map. The base input won't be in that map, 
and that base input might be a regular table. I think in the simplest case 
where a regular table is joined with some subquery result, like `select page, 
q.c from wikipedia, (select count(*) c from "kttm-v2-2019-08-25") q`, the 
segments from `wikipedia` won't be checked since they'll be using the map 
function generated here.
   
   I tried setting a breakpoint at 
`validateOrElseThrow(ReferenceCountingSegment segment, Policy policy)` and did 
find that. For MSQ tasks, the breakpoint triggered for `kttm-v2-2019-08-25` but 
not for `wikipedia`.
   
   For Dart the situation was a little worse; the breakpoint for the 
`validateOrElseThrow(ReferenceCountingSegment segment, Policy policy)` method 
triggered also only for `kttm-v2-2019-08-25`, but then `validate` wasn't called 
because Dart has two layers of `ReferenceCountingSegment`, so the check 
`baseSegment instanceof QueryableIndexSegment || baseSegment instanceof 
IncrementalIndexSegment` is `false`.
   
   IMO, the best approach to the first issue (segment map fn created by 
`BroadcastJoinSegmentMapFnProcessor` not having enforcement) is to move the 
`PolicyEnforcer` into `DataSource#createSegmentMapFunction`, just like 
`withPolicies`. That way we never need to think about whether a particular bare 
call to `DataSource#createSegmentMapFunction` is safe or not, because all calls 
would need the enforcer present.
   
   For the second issue (Dart not validating properly because of double-wrapped 
segments) -- perhaps it would make sense to move `validateOrElseThrow` from 
`ReferenceCountingSegment` to `Segment`, and have the impl in 
`ReferenceCountingSegment` call 
`getBaseSegment().validateOrElseThrow(policyEnforcer)` rather than 
`validateOrElseThrow(this, policyEnforcer)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to