cgivre commented on code in PR #3052: URL: https://github.com/apache/drill/pull/3052#discussion_r3453234919
########## THREAT_MODEL.md: ########## @@ -0,0 +1,198 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +# Apache Drill — Threat Model (v0 draft) + +## §1 Header + +- **Project:** Apache Drill (`apache/drill`), `master` @ HEAD (2026-06). Scope: `apache/drill` only. +- **Date:** 2026-06-18. **Author:** ASF Security team, drafted via the threat-model-producer (Scovetta) rubric at the Drill PMC's request (path 3 — chosen by Charles Givre, 2026-06-17). +- **Status:** DRAFT — not yet reviewed by the Drill PMC. Every load-bearing claim is *(inferred)* pending PMC confirmation (see §14). +- **Version binding:** versioned with the project; a report against version *N* is triaged against the model as it stood at *N*. +- **Reporting cross-reference:** §8-property violations → report privately per the ASF process (`[email protected]` → `[email protected]`); §3/§9 findings are closed citing this document. +- **Provenance legend:** *(documented)* = Drill's own docs/repo; *(maintainer)* = confirmed by a Drill PMC member through this process; *(inferred)* = reasoned from architecture/docs, not yet PMC-ratified — each has a matching §14 open question. +- **Draft confidence:** ~10 documented / 0 maintainer / ~28 inferred — a first draft to react to, not a ratified model. + +**What Drill is.** Apache Drill is a **schema-free, distributed SQL query engine** for large-scale datasets. A cluster of **Drillbit** daemons (coordinated via Apache ZooKeeper) accepts ANSI-SQL queries over JDBC, ODBC, a REST API, and a Web UI (default port **8047**), and executes them directly against data in configurable **storage plugins** (local/HDFS/S3 files, Hive, HBase, Kafka, MongoDB, RDBMS via JDBC, HTTP, …) — discovering schema at query time rather than requiring a pre-defined schema. *(documented — drill.apache.org)* + +## §2 Scope and intended use + +Intended deployment: a **clustered set of Drillbits** inside an operator-controlled, network-isolated cluster, queried by analytics clients. Drill also runs in an **embedded** single-JVM mode (developer/laptop) where the caller controls everything. *(documented — install docs; inferred that clustered behind a perimeter is the security-relevant shape)* + +**Caller roles** (Drill is a network service — the role splits): + +- **SQL client** — submits queries over JDBC/ODBC/REST. May be authenticated or, if auth is disabled, anonymous. The primary untrusted boundary. *(inferred — §14 Q1)* +- **Operator / admin** — configures storage plugins, system/session options, security settings, and uploads dynamic UDFs via the Web UI/REST. **Trusted.** *(documented — storage-plugin + option config is an admin function)* +- **Embedding / submitting user under impersonation** — when impersonation is enabled, the Drillbit accesses the underlying data source *as* the submitting user, delegating authorization to the data source. *(documented — impersonation docs)* +- **Peer Drillbit** — another node in the cluster, authenticated over the inter-Drillbit RPC. Trusted once authenticated (§7 Byzantine note). *(inferred — §14 Q3)* + +**Component families.** + +| Family | Entry point | Touches OS/network | In model? | +| --- | --- | --- | --- | +| Client RPC / SQL front door | Drillbit user RPC (JDBC/ODBC), the "foreman" | network (listens) | **In — primary boundary** *(inferred — §14 Q1)* | +| Web UI + REST API | `:8047` — query submit, storage-plugin config, profiles, options | network (listens) | **In — high value (admin surface)** *(documented — Web UI)* | +| SQL engine | parse → plan (Calcite) → distributed execute | depends on plugins | **In** *(inferred)* | +| Storage plugins | file/HDFS/S3, Hive, HBase, Kafka, Mongo, **JDBC (RDBMS)**, **HTTP** | filesystem + network out | **In — file-read / SSRF / connector surface** *(documented — storage plugins)* | +| UDF layer | built-in functions + **dynamic UDFs** (runtime jar upload) | in-JVM code execution | **In as code-execution-by-design** *(documented — dynamic UDFs)* | +| Inter-Drillbit RPC | control + data RPC between Drillbits, SASL | network (intra-cluster) | **In (boundary is authenticated)** *(inferred — §14 Q3)* | +| Client connectors | JDBC driver, ODBC driver, C++ client | client trust domain | client-side — out (§3) | +| Examples / tests / contrib | `contrib/`-bundled formats, `exec/.../test`, samples | n/a | **Out** *(see §3)* | + +## §3 Out of scope (explicit non-goals) + +- **The security of the underlying data sources.** HDFS/S3 permissions, the RDBMS behind the JDBC plugin, Hive authorization, the Kafka cluster — each enforces its own access control. Drill assumes them; it is not responsible for their misconfiguration. *(inferred — §14 Q2)* +- **Dynamic UDFs and storage-plugin configuration as code/connectivity execution.** Uploading a UDF jar or configuring a storage plugin is an **authorized admin operation**; the code/connection runs with the Drillbit's privileges. This is a feature, not a sandbox escape — `BY-DESIGN`. A finding that requires admin rights to register a UDF or add a plugin is out of model. *(documented — admin function; §14 Q4 confirms the trust line)* +- **An operator with cluster/root access, the Drillbit service account, or direct ZooKeeper write access.** Anyone who controls the cluster processes or the coordination store has already won. `OUT-OF-MODEL: adversary-not-in-scope`. *(inferred — §14 Q5)* +- **Embedded / single-user mode**, where the caller owns the JVM, the filesystem, and all input — equivalent to an in-process library call. *(inferred — §14 Q1)* +- **SQL injection in an embedding application** that string-concatenates untrusted input into Drill SQL — that is the embedding app's bug, not Drill's. *(inferred — §11)* +- **Bundled connectors / formats under `contrib/`, tests, and examples** — separately authored; threat-model per connector if promoted. *(inferred)* Review Comment: I don't the components in the `contrib/` folder should be out of scope. These ship with Drill and are maintained as part of the Drill project. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
