gianm commented on issue #13837:
URL: https://github.com/apache/druid/issues/13837#issuecomment-1447182330

   With regard to backwards-compatibility with `(EXTERNAL, EXTERNAL, READ)` 
stuff, the feature flag approach sounds good to me. The documentation on this 
page would need to be updated as well: 
https://druid.apache.org/docs/latest/multi-stage-query/security.html.
   
   Some notes on the other stuff we'll need to sort out as part of this:
   
   ### non-http protocols via `http` input source
   
   The `http` input source is implementing using 
[java.net.URLConnection](https://docs.oracle.com/javase/8/docs/api/java/net/URLConnection.html),
 which can handle various protocols other than http (including local 
`file://`). Currently the config `druid.ingestion.http.allowedProtocols` 
(default: `http, https`) is used to control which protocols are permitted via 
this input source.
   
   We should consider how this all fits together. Perhaps something like this:
   
   - `(EXTERNAL, http, READ)` refers to the `http` input source.
   - The `http` input source may, if `druid.ingestion.http.allowedProtocols` is 
set, handle non-http protocols. This isn't the concern of the authorization 
layer.
   - To ensure that people who use either of the above features (`EXTERNAL` 
authorization, or `druid.ingestion.http.allowedProtocols`) understand their 
interaction, we should include notesĀ about this in the docs for both features 
(with examples).
   
   ### non-hdfs protocols via `hdfs` input source
   
   The `hdfs` input source has a similar behavior to the `http` input source. 
Like `http`, it supports various non-hdfs protocols. Like `http`, there is a 
`druid.ingestion.hdfs.allowedProtocols` that controls which protocols are 
allowed. Like `http`, the default set is limited to only the obvious one 
(`hdfs`).
   
   So, we should be able to take the same approach here that we take with 
`http`.
   
   ### firehose factories
   
   
[Firehoses](https://druid.apache.org/docs/latest/ingestion/native-batch-firehose.html)
 are a deprecated predecessor to the current "input source" concept. They have 
been deprecated since 0.17 (late 2019). If we're going with a feature flag for 
the overall input-source-security feature, IMO it makes sense for that feature 
flag to also disable firehose factories completely. This absolves us of the 
responsibility to figure out how to fit them into the new security framework.
   
   ### Hadoop ingest
   
   Hadoop ingest doesn't use our input source concept: instead, it uses Hadoop 
filesystems and path globs. One approach that comes to mind here is to 
special-case it to piggyback on the native `hdfs` input source. The idea being:
   
   - If a user has `(EXTERNAL, hdfs, READ)` permissions then they can submit 
Hadoop ingest jobs.
   - If a user does _not_ have those permission, then they _cannot_ submit 
Hadoop ingest jobs.
   
   It would be excellent to, in addition, introduce a permission (or 
cluster-wide setting) specifically for whether it is possible to submit Hadoop 
jobs. People that do not use Hadoop integration would appreciate the 
opportunity to switch it off completely, thereby minimizing their potential 
attack surface.
   
   ### Realtime ingest
   
   Realtime ingest doesn't use our input source concept: instead, it uses Kafka 
and Kinesis supervisors with system-specific `ioConfig` APIs.
   
   One approach that comes to mind is something similar to the proposal for 
Hadoop above: special-case these to use `(EXTERNAL, kafka, READ)` and 
`(EXTERNAL, kinesis, READ)` respectively. This doesn't make quite as much sense 
as the Hadoop case, since while there _is_ an `hdfs` input source, there are no 
`kafka` and `kinesis` input sources.
   
   I'm open to other ideas.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to