gianm commented on code in PR #17774:
URL: https://github.com/apache/druid/pull/17774#discussion_r2046226849
##########
processing/src/main/java/org/apache/druid/segment/SegmentReference.java:
##########
@@ -19,12 +19,31 @@
package org.apache.druid.segment;
+import org.apache.druid.query.policy.NoopPolicyEnforcer;
+import org.apache.druid.query.policy.PolicyEnforcer;
+
/**
- * A {@link Segment} with a associated references, such as {@link
ReferenceCountingSegment} where the reference is
+ * A {@link Segment} with an associated references, such as {@link
ReferenceCountingSegment} where the reference is
* the segment itself, and {@link
org.apache.druid.segment.join.HashJoinSegment} which wraps a
* {@link ReferenceCountingSegment} and also includes the associated list of
* {@link org.apache.druid.segment.join.JoinableClause}
*/
public interface SegmentReference extends Segment, ReferenceCountedObject
{
+
+ /**
+ * Validates if the segment complies with the policy restrictions on tables.
+ * <p>
+ * This should be called right before the segment is about to be processed
by the query stack, and after
+ * {@link
org.apache.druid.query.planning.ExecutionVertex#createSegmentMapFunction(PolicyEnforcer)}.
+ */
+ default void validateOrElseThrow(PolicyEnforcer policyEnforcer)
Review Comment:
> Looking at `BroadcastJoinSegmentMapFnProcessor`, it converts
`InputNumberDataSource` to `InlineDataSource`, but the enforcer won't have
effects on inline data.
It only inlines if the input number is in the
`inputNumberToProcessorChannelMap` map. The base input won't be in that map,
and that base input might be a regular table. I think in the simplest case
where a regular table is joined with some subquery result, like `select page,
q.c from wikipedia, (select count(*) c from "kttm-v2-2019-08-25") q`, the
segments from `wikipedia` won't be checked since they'll be using the map
function generated here.
I tried setting a breakpoint at
`validateOrElseThrow(ReferenceCountingSegment segment, Policy policy)` and did
find that. For MSQ tasks, the breakpoint triggered for `kttm-v2-2019-08-25` but
not for `wikipedia`.
For Dart the situation was a little worse; the breakpoint for the
`validateOrElseThrow(ReferenceCountingSegment segment, Policy policy)` method
triggered also only for `kttm-v2-2019-08-25`, but then `validate` wasn't called
because Dart has two layers of `ReferenceCountingSegment`, so the check
`baseSegment instanceof QueryableIndexSegment || baseSegment instanceof
IncrementalIndexSegment` is `false`.
IMO, the best approach to the first issue (segment map fn created by
`BroadcastJoinSegmentMapFnProcessor` not having enforcement) is to move the
`PolicyEnforcer` into `DataSource#createSegmentMapFunction`, just like
`withPolicies`. That way we never need to think about whether a particular bare
call to `DataSource#createSegmentMapFunction` is safe or not, because all calls
would need the enforcer present.
For the second issue (Dart not validating properly because of double-wrapped
segments) -- perhaps it would make sense to move `validateOrElseThrow` from
`ReferenceCountingSegment` to `Segment`, and have the impl in
`ReferenceCountingSegment` call
`getBaseSegment().validateOrElseThrow(policyEnforcer)` rather than
`validateOrElseThrow(this, policyEnforcer)`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]