gianm commented on code in PR #17774:
URL: https://github.com/apache/druid/pull/17774#discussion_r2056445526
##########
processing/src/main/java/org/apache/druid/segment/SegmentReference.java:
##########
@@ -19,12 +19,31 @@
package org.apache.druid.segment;
+import org.apache.druid.query.policy.NoopPolicyEnforcer;
+import org.apache.druid.query.policy.PolicyEnforcer;
+
/**
- * A {@link Segment} with a associated references, such as {@link
ReferenceCountingSegment} where the reference is
+ * A {@link Segment} with an associated references, such as {@link
ReferenceCountingSegment} where the reference is
* the segment itself, and {@link
org.apache.druid.segment.join.HashJoinSegment} which wraps a
* {@link ReferenceCountingSegment} and also includes the associated list of
* {@link org.apache.druid.segment.join.JoinableClause}
*/
public interface SegmentReference extends Segment, ReferenceCountedObject
{
+
+ /**
+ * Validates if the segment complies with the policy restrictions on tables.
+ * <p>
+ * This should be called right before the segment is about to be processed
by the query stack, and after
+ * {@link
org.apache.druid.query.planning.ExecutionVertex#createSegmentMapFunction(PolicyEnforcer)}.
+ */
+ default void validateOrElseThrow(PolicyEnforcer policyEnforcer)
Review Comment:
> To solidify, I'm thinking, can we eliminate the use cases when
`ReferenceCountingSegment` is wrapped with another `SegmentReference`? I think
it would be simpler if we know the structure is `RestrictedSegment` wraps a
`ReferenceCountingSegment`, which wraps a basic `segment`
(`QueryableIndexSegment`, `LookupSegment`, etc).
I did see double-wrapping (`ReferenceCountingSegment` on top of
`ReferenceCountingSegment`) with Dart with the query `select page, q.c from
wikipedia, (select count(*) c from "kttm-v2-2019-08-25") q`. However, I doubt
this is "necessary". The Dart code could likely be adjusted to not do this. So,
if it's helpful, go ahead and restrict things so `ReferenceCountingSegment`
cannot wrap another `SegmentReference`.
> To have equally robust validation for `Segment`, I think we would need to
add `Segment#getChildren` to address the first point. For the second point I
think we can use a check like this for leaf segments to see if they correspond
to regular tables: `as(PhysicalSegmentInspector.class) != null`.
About this (using `PhysicalSegmentInspector` as a proxy for "Segment
represents a regular table"), I thought about it some more and IMO this
approach is still not ideal. Nothing in the interface says that it can't be
implemented for a non-table, and nothing requires that it is implemented for a
table.
I think a nicer idea would be to make `Segment#getId` nullable, and spec it
so that it should return nonnull for regular tables backed by actual segments,
null for anything else. This to me would make more sense than the current way
`getId()` works, because currently dummy IDs are needed for `Segment` that
aren't backed by actual segments, which seems odd. I skimmed usages of
`getId()`; most seem to either be running in scenarios where it would always be
nonnull anyway, or else are using it to interpolate into log messages (could
use `toString` or `asString` instead).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]