gianm commented on code in PR #17774:
URL: https://github.com/apache/druid/pull/17774#discussion_r2053382100
##########
processing/src/main/java/org/apache/druid/segment/SegmentReference.java:
##########
@@ -19,12 +19,31 @@
package org.apache.druid.segment;
+import org.apache.druid.query.policy.NoopPolicyEnforcer;
+import org.apache.druid.query.policy.PolicyEnforcer;
+
/**
- * A {@link Segment} with a associated references, such as {@link
ReferenceCountingSegment} where the reference is
+ * A {@link Segment} with an associated references, such as {@link
ReferenceCountingSegment} where the reference is
* the segment itself, and {@link
org.apache.druid.segment.join.HashJoinSegment} which wraps a
* {@link ReferenceCountingSegment} and also includes the associated list of
* {@link org.apache.druid.segment.join.JoinableClause}
*/
public interface SegmentReference extends Segment, ReferenceCountedObject
{
+
+ /**
+ * Validates if the segment complies with the policy restrictions on tables.
+ * <p>
+ * This should be called right before the segment is about to be processed
by the query stack, and after
+ * {@link
org.apache.druid.query.planning.ExecutionVertex#createSegmentMapFunction(PolicyEnforcer)}.
+ */
+ default void validateOrElseThrow(PolicyEnforcer policyEnforcer)
Review Comment:
The changes to `BroadcastJoinSegmentMapFnProcessor` look good to me.
The implementation and javadoc of `PolicyEnforcer#validateOrElseThrow` for
`ReferenceCountingSegment` seems a little sketchy still, for a couple reasons:
- The "else skip validation" seems like a hole for stuff to fall into. Prior
to the latest fix, an inner `ReferenceCountingSegment` did fall into it. I'm
wondering if other things could potentially fall in, like extension segment
types, new segment types, etc.
- The branch doing `instanceof QueryableIndexSegment || instanceof
IncrementalIndexSegment` isn't strong enough to detect whether a `Segment`
corresponds to a regular table, because extensions can add other kinds of
segments backing tables. This is a specific kind of thing that could fall into
the hole mentioned previously.
- I don't understand why the `instanceof RestrictedSegment` branch is needed
-- is it really possible for RestrictedSegment to wrap another
RestrictedSegment? I would think this is impossible given that
RestrictedDataSource can only wrap a TableDataSource.
I observe that the `DataSource` version of validation is cleaner and more
robust, and can be because there are two important things `DataSource` has:
- `DataSource#getChildren` exists, so datasource trees can be walked robustly
- We know that tables correspond to `TableDataSource`, so it's always
possible to identify whether a leaf datasource is a regular table or not.
To have equally robust validation for `Segment`, I think we would need to
add `Segment#getChildren` to address the first point. For the second point I
think we can use a check like this for leaf segments to see if they correspond
to regular tables: `as(PhysicalSegmentInspector.class) != null`.
I tried to think for a while of a robust way to write this code without
adding `Segment#getChildren` and wasn't able to. So I do think we should do
that. The default implementation should throw an unsupported operation error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]