[GitHub] [druid] gianm commented on issue #13837: Input source security model for MSQ table functions and more

via GitHub Mon, 27 Feb 2023 14:04:46 -0800


gianm commented on issue #13837:
URL: https://github.com/apache/druid/issues/13837#issuecomment-1447182330

With regard to backwards-compatibility with `(EXTERNAL, EXTERNAL, READ)`
stuff, the feature flag approach sounds good to me. The documentation on this
page would need to be updated as well:
https://druid.apache.org/docs/latest/multi-stage-query/security.html.

Some notes on the other stuff we'll need to sort out as part of this:

### non-http protocols via `http` input source

The `http` input source is implementing using
[java.net.URLConnection](https://docs.oracle.com/javase/8/docs/api/java/net/URLConnection.html),
which can handle various protocols other than http (including local
`file://`). Currently the config `druid.ingestion.http.allowedProtocols`
(default: `http, https`) is used to control which protocols are permitted via
this input source.

We should consider how this all fits together. Perhaps something like this:

- `(EXTERNAL, http, READ)` refers to the `http` input source.
- The `http` input source may, if `druid.ingestion.http.allowedProtocols` is
set, handle non-http protocols. This isn't the concern of the authorization
layer.
- To ensure that people who use either of the above features (`EXTERNAL`
authorization, or `druid.ingestion.http.allowedProtocols`) understand their
interaction, we should include notes about this in the docs for both features
(with examples).

### non-hdfs protocols via `hdfs` input source

The `hdfs` input source has a similar behavior to the `http` input source.
Like `http`, it supports various non-hdfs protocols. Like `http`, there is a
`druid.ingestion.hdfs.allowedProtocols` that controls which protocols are
allowed. Like `http`, the default set is limited to only the obvious one
(`hdfs`).

So, we should be able to take the same approach here that we take with
`http`.

### firehose factories

[Firehoses](https://druid.apache.org/docs/latest/ingestion/native-batch-firehose.html)
are a deprecated predecessor to the current "input source" concept. They have
been deprecated since 0.17 (late 2019). If we're going with a feature flag for
the overall input-source-security feature, IMO it makes sense for that feature
flag to also disable firehose factories completely. This absolves us of the
responsibility to figure out how to fit them into the new security framework.

### Hadoop ingest

Hadoop ingest doesn't use our input source concept: instead, it uses Hadoop
filesystems and path globs. One approach that comes to mind here is to
special-case it to piggyback on the native `hdfs` input source. The idea being:

- If a user has `(EXTERNAL, hdfs, READ)` permissions then they can submit
Hadoop ingest jobs.
- If a user does _not_ have those permission, then they _cannot_ submit
Hadoop ingest jobs.

It would be excellent to, in addition, introduce a permission (or
cluster-wide setting) specifically for whether it is possible to submit Hadoop
jobs. People that do not use Hadoop integration would appreciate the
opportunity to switch it off completely, thereby minimizing their potential
attack surface.

### Realtime ingest

Realtime ingest doesn't use our input source concept: instead, it uses Kafka
and Kinesis supervisors with system-specific `ioConfig` APIs.

One approach that comes to mind is something similar to the proposal for
Hadoop above: special-case these to use `(EXTERNAL, kafka, READ)` and
`(EXTERNAL, kinesis, READ)` respectively. This doesn't make quite as much sense
as the Hadoop case, since while there _is_ an `hdfs` input source, there are no
`kafka` and `kinesis` input sources.

I'm open to other ideas.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm commented on issue #13837: Input source security model for MSQ table functions and more

Reply via email to