gianm commented on code in PR #17774:
URL: https://github.com/apache/druid/pull/17774#discussion_r2033843384


##########
processing/src/main/java/org/apache/druid/query/policy/PolicyEnforcer.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.policy;
+
+import com.fasterxml.jackson.annotation.JsonSubTypes;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import org.apache.druid.guice.annotations.UnstableApi;
+import org.apache.druid.query.DataSource;
+import org.apache.druid.segment.SegmentReference;
+
+/**
+ * Interface for enforcing policies on data sources and segments in Druid 
queries.
+ * <p>
+ * Note: The {@code PolicyEnforcer} is intended to serve as a sanity checker 
and not as a primary authorization mechanism.
+ * It should not be used to implement security rules. Instead, it acts as a 
last line of defense to verify that
+ * security policies have been implemented correctly and to prevent incorrect 
policy usage.
+ * </p>
+ */
+@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
+@JsonSubTypes({
+    @JsonSubTypes.Type(value = NoopPolicyEnforcer.class, name = "none"),
+    @JsonSubTypes.Type(value = RestrictAllTablesPolicyEnforcer.class, name = 
"restrictAllTables"),
+})
+@UnstableApi
+public interface PolicyEnforcer
+{
+
+  /**
+   * Validates a {@link DataSource} against the policy enforcer. This method 
is invoked before query execution

Review Comment:
   To me this javadoc isn't quite clear on the most important things. IMO those 
important things are:
   
   - the caller walks the datasource tree, picks out all `restricted` and 
`table`, and calls `validate` on those
   - the `ds` will always be of type `restricted` or `table`
   - `validate` will _not_ be called for a `table` that is nested within 
`restricted`
   
   Suggested wording:
   
   ```
     /**
      * Validates a {@link DataSource} against the policy enforcer. Prior to 
query execution, the
      * {@link Query#getDataSource()} tree is walked. This method is invoked 
once for each {@link RestrictedDataSource}
      * and once for each {@link TableDataSource} that is not wrapped inside a 
{@link RestrictedDataSource}, no matter
      * where they appear within the tree.
      *
      * This method is invoked by {@link DataSource#validate(PolicyEnforcer)}. 
Direct invocation of this method
      * is discouraged; use {@link DataSource#validate(PolicyEnforcer)} instead.
      *
      * @param ds     the data source to validate. Always {@link 
RestrictedDataSource} or {@link TableDataSource}
      * @param policy the policy on the data source; either {@link 
RestrictedDataSource#policy} or null for {@link TableDataSource}
      *
      * @return {@code true} if the data source complies with the policy, 
{@code false} otherwise
      */
   ```



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskQueryMaker.java:
##########
@@ -124,6 +128,14 @@ public QueryResponse<Object[]> runQuery(final DruidQuery 
druidQuery)
   {
     Hook.QUERY_PLAN.run(druidQuery.getQuery());
     plannerContext.dispatchHook(DruidHook.NATIVE_PLAN, druidQuery.getQuery());
+    if (!druidQuery.getQuery().getDataSource().validate(policyEnforcer)) {

Review Comment:
   We're missing a call for Dart in this PR. If you make the suggested change 
where validation is tied to `withPolicies`, that should fix it.



##########
processing/src/main/java/org/apache/druid/query/policy/RestrictAllTablesPolicyEnforcer.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.policy;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.query.DataSource;
+import org.apache.druid.query.RestrictedDataSource;
+import org.apache.druid.segment.RestrictedSegment;
+import org.apache.druid.segment.SegmentReference;
+import org.reflections.Reflections;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * All tables must be restricted by a policy.
+ */
+public class RestrictAllTablesPolicyEnforcer implements PolicyEnforcer
+{
+  private final ImmutableList<String> allowedPolicies;
+
+  @JsonCreator
+  public RestrictAllTablesPolicyEnforcer(
+      @Nullable @JsonProperty("allowedPolicies") List<String> allowedPolicies
+  )
+  {
+    if (allowedPolicies == null) {
+      this.allowedPolicies = ImmutableList.of();
+    } else {
+      Set<String> policyClazz = new 
Reflections("org.apache.druid.query.policy").getSubTypesOf(Policy.class)
+                                                                               
 .stream()
+                                                                               
 .map(Class::getSimpleName)
+                                                                               
 .collect(Collectors.toSet());
+      ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
+      List<String> unrecognizedClazz = allowedPolicies.stream()
+                                                      .filter(p -> {
+                                                        if 
(policyClazz.contains(p)) {
+                                                          return false;
+                                                        }
+                                                        try {
+                                                          // try load the 
class from its full path
+                                                          Class<?> clazz = 
classLoader.loadClass(p);
+                                                          return 
clazz.isAssignableFrom(Policy.class);
+                                                        }
+                                                        catch 
(ClassNotFoundException e) {
+                                                          return true;
+                                                        }
+                                                      })
+                                                      
.collect(Collectors.toList());
+      Preconditions.checkArgument(
+          unrecognizedClazz.isEmpty(),
+          "Unrecognized class[%s], ensure that the class is loaded and is a 
subclass of Policy",
+          unrecognizedClazz
+      );
+      this.allowedPolicies = ImmutableList.copyOf(allowedPolicies);
+    }
+  }
+
+  @Override
+  public boolean validate(DataSource ds, Policy policy)
+  {
+    if (ds instanceof RestrictedDataSource) {
+      return allowedPolicies.isEmpty()
+             || allowedPolicies.contains(policy.getClass().getSimpleName())

Review Comment:
   Please just use `getName`. The simple name is potentially ambiguous since it 
doesn't include the package.



##########
processing/src/main/java/org/apache/druid/query/policy/PolicyEnforcer.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.policy;
+
+import com.fasterxml.jackson.annotation.JsonSubTypes;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import org.apache.druid.guice.annotations.UnstableApi;
+import org.apache.druid.query.DataSource;
+import org.apache.druid.segment.SegmentReference;
+
+/**
+ * Interface for enforcing policies on data sources and segments in Druid 
queries.
+ * <p>
+ * Note: The {@code PolicyEnforcer} is intended to serve as a sanity checker 
and not as a primary authorization mechanism.
+ * It should not be used to implement security rules. Instead, it acts as a 
last line of defense to verify that
+ * security policies have been implemented correctly and to prevent incorrect 
policy usage.
+ * </p>
+ */
+@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
+@JsonSubTypes({
+    @JsonSubTypes.Type(value = NoopPolicyEnforcer.class, name = "none"),
+    @JsonSubTypes.Type(value = RestrictAllTablesPolicyEnforcer.class, name = 
"restrictAllTables"),
+})
+@UnstableApi
+public interface PolicyEnforcer
+{
+
+  /**
+   * Validates a {@link DataSource} against the policy enforcer. This method 
is invoked before query execution
+   * via {@link DataSource#validate(PolicyEnforcer)}. The callsite must ensure 
that the data source is not a composite
+   * tree, and it must be a druid table. Specifically, it must be one of 
TableDataSource and RestrictedTableDataSource.
+   * <p>
+   * Direct invocation of this method is discouraged; use {@link 
DataSource#validate(PolicyEnforcer)} instead.
+   *
+   * @param ds     the data source to validate
+   * @param policy the policy on the data source, e.x. {@link 
org.apache.druid.query.RestrictedDataSource#policy} or null for {@link 
org.apache.druid.query.TableDataSource}
+   * @return {@code true} if the data source complies with the policy, {@code 
false} otherwise
+   */
+  boolean validate(DataSource ds, Policy policy);
+
+  /**
+   * Validates a {@link SegmentReference} against the policy enforcer. This 
method is invoked before query execution
+   * via {@link SegmentReference#validate(PolicyEnforcer)}. The callsite must 
ensure that the segment reference is not a
+   * deeply linked segment reference, and it must be referenced to a druid 
table.
+   * <p>
+   * Direct invocation of this method is discouraged; use {@link 
SegmentReference#validate(PolicyEnforcer)} instead.
+   *
+   * @param segment the segment to validate
+   * @param policy  the policy on the segment, {@link 
org.apache.druid.segment.RestrictedSegment#policy} or null for other
+   * @return {@code true} if the segment complies with the policy, {@code 
false} otherwise
+   */
+  boolean validate(SegmentReference segment, Policy policy);

Review Comment:
   Similar comments on the javadoc as the other `validate` method.



##########
processing/src/main/java/org/apache/druid/query/policy/RestrictAllTablesPolicyEnforcer.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.policy;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.query.DataSource;
+import org.apache.druid.query.RestrictedDataSource;
+import org.apache.druid.segment.RestrictedSegment;
+import org.apache.druid.segment.SegmentReference;
+import org.reflections.Reflections;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * All tables must be restricted by a policy.
+ */
+public class RestrictAllTablesPolicyEnforcer implements PolicyEnforcer
+{
+  private final ImmutableList<String> allowedPolicies;
+
+  @JsonCreator
+  public RestrictAllTablesPolicyEnforcer(
+      @Nullable @JsonProperty("allowedPolicies") List<String> allowedPolicies
+  )
+  {
+    if (allowedPolicies == null) {
+      this.allowedPolicies = ImmutableList.of();
+    } else {
+      Set<String> policyClazz = new 
Reflections("org.apache.druid.query.policy").getSubTypesOf(Policy.class)
+                                                                               
 .stream()
+                                                                               
 .map(Class::getSimpleName)
+                                                                               
 .collect(Collectors.toSet());
+      ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
+      List<String> unrecognizedClazz = allowedPolicies.stream()
+                                                      .filter(p -> {
+                                                        if 
(policyClazz.contains(p)) {
+                                                          return false;
+                                                        }
+                                                        try {
+                                                          // try load the 
class from its full path
+                                                          Class<?> clazz = 
classLoader.loadClass(p);
+                                                          return 
clazz.isAssignableFrom(Policy.class);
+                                                        }
+                                                        catch 
(ClassNotFoundException e) {
+                                                          return true;
+                                                        }
+                                                      })
+                                                      
.collect(Collectors.toList());
+      Preconditions.checkArgument(

Review Comment:
   I do think this validation piece should be removed. Removing it will make 
this enforcer more flexible, since it would be able to work with extension 
policies. I recognize it's mostly just here for tests, but it is in `main` not 
`test` and it would be nice for it to be more useful.



##########
processing/src/main/java/org/apache/druid/segment/SegmentReference.java:
##########
@@ -27,4 +29,15 @@
  */
 public interface SegmentReference extends Segment, ReferenceCountedObject
 {
+
+  /**
+   * Returns true if the segment complies with the policy restrictions on 
tables.
+   * <p>
+   * This should be called right before the segment is about to be processed 
by the query stack.
+   */
+  default boolean validate(PolicyEnforcer policyEnforcer)

Review Comment:
   A call is missing on the realtime path; there's one in 
`ReferenceCountingSegmentQueryRunner` but that's only used by Historicals.
   
   Similar to the idea about tying together `DataSource.validate` with 
`withPolicies`, we could simplify this by tying it together with the 
application of functions created by `createSegmentMapFunction`.
   
   i.e., create a utility method that applies a segment map function 
`Function<SegmentReference, SegmentReference>` and also calls `validate` on the 
result, then use that utility method everywhere we apply segment map functions.
   
   or this might be easier to integrate in the current code: create a utility 
method that decorates a segment map function with a validator. i.e. it takes 
`Function<SegmentReference, SegmentReference>` + `PolicyEnforcer`, and it 
returns `Function<SegmentReference, SegmentReference>` which embeds the 
validation call.



##########
server/src/main/java/org/apache/druid/client/CachingClusteredClient.java:
##########
@@ -199,6 +204,13 @@ private <T> Sequence<T> run(
       final boolean specificSegments
   )
   {
+    Query<T> query = queryPlus.getQuery();
+    if (!query.getDataSource().validate(policyEnforcer)) {

Review Comment:
   This doesn't cover everything; there should be another check on the local 
side, since it's possible for the local runner on the Broker to query a regular 
table, if it's a broadcast table.
   
   Although, thinking more about the big picture, I actually think this check 
would be better placed next to `withPolicies`, i.e., in `QueryLifecycle` and 
`DruidQuery`. It should be called right after `withPolicies`. That way we know 
it's always validating the policies that were just put in place. To make them 
even more tied-together, let's have a utility method that does `withPolicies` 
and also does the validation.



##########
server/src/main/java/org/apache/druid/server/coordination/ServerManager.java:
##########
@@ -189,6 +194,12 @@ public <T> QueryRunner<T> 
getQueryRunnerForSegments(Query<T> theQuery, Iterable<
       throw new ISE("Cannot handle subquery: %s", dataSourceFromQuery);
     }
 
+    if (!dataSourceFromQuery.validate(policyEnforcer)) {

Review Comment:
   This check + exception is copy/pasted four places currently. Please put it 
into a utility method so it is easier to maintain. Same thing for the `Segment` 
version, please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to