gianm commented on code in PR #17774: URL: https://github.com/apache/druid/pull/17774#discussion_r2022772602
########## processing/src/main/java/org/apache/druid/query/policy/PolicyEnforcer.java: ########## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.policy; + +import com.fasterxml.jackson.annotation.JsonSubTypes; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import org.apache.druid.guice.annotations.UnstableApi; +import org.apache.druid.query.DataSource; + +/** + * A mechanism for enforcing policies on data sources in Druid queries. + */ +@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = NoopPolicyEnforcer.class) +@JsonSubTypes({ + @JsonSubTypes.Type(value = NoopPolicyEnforcer.class, name = "none"), + @JsonSubTypes.Type(value = RestrictAllTablesPolicyEnforcer.class, name = "restrictAllTables"), +}) +@UnstableApi +public interface PolicyEnforcer +{ + + /** + * Validate a data source against the policy enforcer. Review Comment: This should specify exactly where the data source is taken from and how the enforcer method will be called. ########## processing/src/main/java/org/apache/druid/segment/RestrictedSegment.java: ########## @@ -42,13 +43,11 @@ */ public class RestrictedSegment implements SegmentReference { - protected final SegmentReference delegate; - protected final Policy policy; + @VisibleForTesting + public final SegmentReference delegate; Review Comment: I don't see where this is being used in tests. Is it meant to be? ########## server/src/main/java/org/apache/druid/server/coordination/ServerManager.java: ########## @@ -190,6 +195,10 @@ public <T> QueryRunner<T> getQueryRunnerForSegments(Query<T> theQuery, Iterable< throw new ISE("Cannot handle subquery: %s", dataSourceFromQuery); } + if (!(theQuery instanceof SegmentMetadataQuery) && !policyEnforcer.validate(dataSourceFromQuery)) { Review Comment: `SegmentMetadataQuery` should not be exempt from policy enforcement. Why is this special case needed here, and in `CachingClusteredClient`? Perhaps we can get rid of it by fixing something in the model. ########## processing/src/main/java/org/apache/druid/query/DataSource.java: ########## @@ -140,6 +141,16 @@ default DataSource withPolicies(Map<String, Optional<Policy>> policyMap) return this.withChildren(children); } + /** + * Returns true if the datasource complies with the policy restrictions on tables. + */ + default boolean validate(PolicyEnforcer policyEnforcer) Review Comment: It looks like this method is unused. We won't need it anyway, since the enforcer is able to inspect the query's datasource directly. Please remove it. ########## processing/src/main/java/org/apache/druid/query/policy/RestrictAllTablesPolicyEnforcer.java: ########## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.policy; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import org.apache.druid.query.DataSource; +import org.apache.druid.query.RestrictedDataSource; +import org.reflections.Reflections; + +import javax.annotation.Nullable; +import java.util.List; +import java.util.Objects; +import java.util.Set; +import java.util.stream.Collectors; + +/** + * All tables must be restricted by a policy. + */ +public class RestrictAllTablesPolicyEnforcer implements PolicyEnforcer +{ + private final ImmutableList<String> allowedPolicies; + + @JsonCreator + public RestrictAllTablesPolicyEnforcer( + @Nullable @JsonProperty("allowedPolicies") List<String> allowedPolicies + ) + { + if (allowedPolicies == null) { + this.allowedPolicies = ImmutableList.of(); + } else { + Set<String> policyClazz = new Reflections("org.apache.druid.query.policy").getSubTypesOf( Review Comment: I'm not totally sure this will work for extensions. At any rate, I think it isn't really necessary, since its purpose appears to be just to verify that `allowedPolicies` only contains valid classes. It's ok if it contains invalid classes. ########## processing/src/main/java/org/apache/druid/query/policy/PolicyEnforcer.java: ########## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.policy; + +import com.fasterxml.jackson.annotation.JsonSubTypes; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import org.apache.druid.guice.annotations.UnstableApi; +import org.apache.druid.query.DataSource; + +/** + * A mechanism for enforcing policies on data sources in Druid queries. + */ +@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = NoopPolicyEnforcer.class) Review Comment: Never include a `defaultImpl` in a `@JsonTypeInfo` info line, unless absolutely needed for backwards compatibility. The reason is that when there is a `defaultImpl`, it catches all invalid types (not just missing types), which is weird and confusing behavior. In this case it would mean that if someone misspells the enforcer `type`, or forgets to load an extension, we would silently use the `none` enforcer. ########## processing/src/main/java/org/apache/druid/query/policy/RestrictAllTablesPolicyEnforcer.java: ########## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.policy; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import org.apache.druid.query.DataSource; +import org.apache.druid.query.RestrictedDataSource; +import org.reflections.Reflections; + +import javax.annotation.Nullable; +import java.util.List; +import java.util.Objects; +import java.util.Set; +import java.util.stream.Collectors; + +/** + * All tables must be restricted by a policy. Review Comment: The enforcer is actually more strict than this, I think. It appears to forbid any compound datasources, so for example, it wouldn't allow `unnest` -> `restricted` -> `table`. That's fine, since it's just here for enabling unit tests. But this should be called out in the javadocs and should be reflected in the name of the class. IMO, `RestrictedTablesOnlyEnforcer` is more clear than `RestrictAllTablesPolicyEnforcer`. ########## processing/src/test/java/org/apache/druid/query/policy/RestrictAllTablesPolicyEnforcerTest.java: ########## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.policy; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.collect.ImmutableList; +import org.apache.druid.query.RestrictedDataSource; +import org.apache.druid.query.TableDataSource; +import org.apache.druid.query.filter.NullFilter; +import org.apache.druid.segment.TestHelper; +import org.junit.Assert; +import org.junit.Test; + +public class RestrictAllTablesPolicyEnforcerTest +{ + @Test + public void test_serialize() throws Exception + { + RestrictAllTablesPolicyEnforcer policyEnforcer = new RestrictAllTablesPolicyEnforcer(null); + ObjectMapper jsonMapper = TestHelper.makeJsonMapper(); + + String expected = "{\"type\":\"restrictAllTables\",\"allowedPolicies\":[]}"; + Assert.assertEquals(expected, jsonMapper.writeValueAsString(policyEnforcer)); + } + + @Test + public void test_serde_roundTrip() throws Exception + { + RestrictAllTablesPolicyEnforcer policyEnforcer = new RestrictAllTablesPolicyEnforcer(null); + ObjectMapper jsonMapper = TestHelper.makeJsonMapper(); + PolicyEnforcer deserialized = jsonMapper.readValue( + jsonMapper.writeValueAsString(policyEnforcer), + PolicyEnforcer.class + ); + Assert.assertEquals(policyEnforcer, deserialized); + } + + @Test + public void test_validate() throws Exception + { + RestrictAllTablesPolicyEnforcer policyEnforcer = new RestrictAllTablesPolicyEnforcer(null); + RowFilterPolicy policy = RowFilterPolicy.from(new NullFilter("some-col", null)); + TableDataSource table = TableDataSource.create("table"); + RestrictedDataSource restricted = RestrictedDataSource.create(table, policy); + + Assert.assertFalse(policyEnforcer.validate(table)); + Assert.assertTrue(policyEnforcer.validate(restricted)); Review Comment: Please include tests for other kinds of datasources, such as `query`, `join`, `unnest`, `inline`, etc. ########## processing/src/main/java/org/apache/druid/query/policy/PolicyEnforcer.java: ########## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.policy; + +import com.fasterxml.jackson.annotation.JsonSubTypes; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import org.apache.druid.guice.annotations.UnstableApi; +import org.apache.druid.query.DataSource; + +/** + * A mechanism for enforcing policies on data sources in Druid queries. + */ +@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = NoopPolicyEnforcer.class) +@JsonSubTypes({ + @JsonSubTypes.Type(value = NoopPolicyEnforcer.class, name = "none"), + @JsonSubTypes.Type(value = RestrictAllTablesPolicyEnforcer.class, name = "restrictAllTables"), +}) +@UnstableApi +public interface PolicyEnforcer Review Comment: IMO it should also be possible for an enforcer to validate the `Segment` that is about to be processed by the query stack. This provides an additional layer of sanity checking after the `DataSource` has been used to decorate the physical `Segment` and produce a mapped `Segment`. Suggested method: ```java boolean validate(Segment segment); ``` It should be called: - for native queries, right before passing a `Segment` to `QueryRunnerFactory#createRunner`. - for MSQ, in `BaseLeafFrameProcessor#runIncrementally` before calling `runWithSegment`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
