cgivre commented on code in PR #3052:
URL: https://github.com/apache/drill/pull/3052#discussion_r3453234919


##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,198 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Drill — Threat Model (v0 draft)
+
+## §1 Header
+
+- **Project:** Apache Drill (`apache/drill`), `master` @ HEAD (2026-06). 
Scope: `apache/drill` only.
+- **Date:** 2026-06-18. **Author:** ASF Security team, drafted via the 
threat-model-producer (Scovetta) rubric at the Drill PMC's request (path 3 — 
chosen by Charles Givre, 2026-06-17).
+- **Status:** DRAFT — not yet reviewed by the Drill PMC. Every load-bearing 
claim is *(inferred)* pending PMC confirmation (see §14).
+- **Version binding:** versioned with the project; a report against version 
*N* is triaged against the model as it stood at *N*.
+- **Reporting cross-reference:** §8-property violations → report privately per 
the ASF process (`[email protected]` → `[email protected]`); §3/§9 
findings are closed citing this document.
+- **Provenance legend:** *(documented)* = Drill's own docs/repo; 
*(maintainer)* = confirmed by a Drill PMC member through this process; 
*(inferred)* = reasoned from architecture/docs, not yet PMC-ratified — each has 
a matching §14 open question.
+- **Draft confidence:** ~10 documented / 0 maintainer / ~28 inferred — a first 
draft to react to, not a ratified model.
+
+**What Drill is.** Apache Drill is a **schema-free, distributed SQL query 
engine** for large-scale datasets. A cluster of **Drillbit** daemons 
(coordinated via Apache ZooKeeper) accepts ANSI-SQL queries over JDBC, ODBC, a 
REST API, and a Web UI (default port **8047**), and executes them directly 
against data in configurable **storage plugins** (local/HDFS/S3 files, Hive, 
HBase, Kafka, MongoDB, RDBMS via JDBC, HTTP, …) — discovering schema at query 
time rather than requiring a pre-defined schema. *(documented — 
drill.apache.org)*
+
+## §2 Scope and intended use
+
+Intended deployment: a **clustered set of Drillbits** inside an 
operator-controlled, network-isolated cluster, queried by analytics clients. 
Drill also runs in an **embedded** single-JVM mode (developer/laptop) where the 
caller controls everything. *(documented — install docs; inferred that 
clustered behind a perimeter is the security-relevant shape)*
+
+**Caller roles** (Drill is a network service — the role splits):
+
+- **SQL client** — submits queries over JDBC/ODBC/REST. May be authenticated 
or, if auth is disabled, anonymous. The primary untrusted boundary. *(inferred 
— §14 Q1)*
+- **Operator / admin** — configures storage plugins, system/session options, 
security settings, and uploads dynamic UDFs via the Web UI/REST. **Trusted.** 
*(documented — storage-plugin + option config is an admin function)*
+- **Embedding / submitting user under impersonation** — when impersonation is 
enabled, the Drillbit accesses the underlying data source *as* the submitting 
user, delegating authorization to the data source. *(documented — impersonation 
docs)*
+- **Peer Drillbit** — another node in the cluster, authenticated over the 
inter-Drillbit RPC. Trusted once authenticated (§7 Byzantine note). *(inferred 
— §14 Q3)*
+
+**Component families.**
+
+| Family | Entry point | Touches OS/network | In model? |
+| --- | --- | --- | --- |
+| Client RPC / SQL front door | Drillbit user RPC (JDBC/ODBC), the "foreman" | 
network (listens) | **In — primary boundary** *(inferred — §14 Q1)* |
+| Web UI + REST API | `:8047` — query submit, storage-plugin config, profiles, 
options | network (listens) | **In — high value (admin surface)** *(documented 
— Web UI)* |
+| SQL engine | parse → plan (Calcite) → distributed execute | depends on 
plugins | **In** *(inferred)* |
+| Storage plugins | file/HDFS/S3, Hive, HBase, Kafka, Mongo, **JDBC (RDBMS)**, 
**HTTP** | filesystem + network out | **In — file-read / SSRF / connector 
surface** *(documented — storage plugins)* |
+| UDF layer | built-in functions + **dynamic UDFs** (runtime jar upload) | 
in-JVM code execution | **In as code-execution-by-design** *(documented — 
dynamic UDFs)* |
+| Inter-Drillbit RPC | control + data RPC between Drillbits, SASL | network 
(intra-cluster) | **In (boundary is authenticated)** *(inferred — §14 Q3)* |
+| Client connectors | JDBC driver, ODBC driver, C++ client | client trust 
domain | client-side — out (§3) |
+| Examples / tests / contrib | `contrib/`-bundled formats, `exec/.../test`, 
samples | n/a | **Out** *(see §3)* |
+
+## §3 Out of scope (explicit non-goals)
+
+- **The security of the underlying data sources.** HDFS/S3 permissions, the 
RDBMS behind the JDBC plugin, Hive authorization, the Kafka cluster — each 
enforces its own access control. Drill assumes them; it is not responsible for 
their misconfiguration. *(inferred — §14 Q2)*
+- **Dynamic UDFs and storage-plugin configuration as code/connectivity 
execution.** Uploading a UDF jar or configuring a storage plugin is an 
**authorized admin operation**; the code/connection runs with the Drillbit's 
privileges. This is a feature, not a sandbox escape — `BY-DESIGN`. A finding 
that requires admin rights to register a UDF or add a plugin is out of model. 
*(documented — admin function; §14 Q4 confirms the trust line)*
+- **An operator with cluster/root access, the Drillbit service account, or 
direct ZooKeeper write access.** Anyone who controls the cluster processes or 
the coordination store has already won. `OUT-OF-MODEL: adversary-not-in-scope`. 
*(inferred — §14 Q5)*
+- **Embedded / single-user mode**, where the caller owns the JVM, the 
filesystem, and all input — equivalent to an in-process library call. 
*(inferred — §14 Q1)*
+- **SQL injection in an embedding application** that string-concatenates 
untrusted input into Drill SQL — that is the embedding app's bug, not Drill's. 
*(inferred — §11)*
+- **Bundled connectors / formats under `contrib/`, tests, and examples** — 
separately authored; threat-model per connector if promoted. *(inferred)*

Review Comment:
   I don't the components in the `contrib/` folder should be out of scope.  
These ship with Drill and are maintained as part of the Drill project. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to