[ https://issues.apache.org/jira/browse/SPARK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-11326. ------------------------------- Resolution: Won't Fix > Support for authentication and encryption in standalone mode > ------------------------------------------------------------ > > Key: SPARK-11326 > URL: https://issues.apache.org/jira/browse/SPARK-11326 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Jacek Lewandowski > > h3.The idea > Currently, in standalone mode, all components, for all network connections > need to use the same secure token if they want to have any security ensured. > This ticket is intended to split the communication in standalone mode to make > it more like in Yarn mode - application internal communication and scheduler > communication. > Such refactoring will allow for the scheduler (master, workers) to use a > distinct secret, which will remain unknown for the users. Similarly, it will > allow for better security in applications, because each application will be > able to use a distinct secret as well. > By providing SASL authentication/encryption for connections between a client > (Client or AppClient) and Spark Master, it becomes possible introducing > pluggable authentication for standalone deployment mode. > h3.Improvements introduced by this patch > This patch introduces the following changes: > * Spark driver or submission client do not have to use the same secret as > workers use to communicate with Master > * Master is able to authenticate individual clients with the following rules: > ** When connecting to the master, the client needs to specify > {{spark.authenticate.secret}} which is an authentication token for the user > specified by {{spark.authenticate.user}} ({{sparkSaslUser}} by default) > ** Master configuration may include additional > {{spark.authenticate.secrets.<username>}} entries for specifying > authentication token for particular users or > {{spark.authenticate.authenticatorClass}} which specify an implementation of > external credentials provider (which is able to retrieve the authentication > token for a given user). > ** Workers authenticate with Master as default user {{sparkSaslUser}}. > * The authorization rules are as follows: > ** A regular user is able to manage only his own application (the application > which he submitted) > ** A regular user is not able to register or manager workers > ** Spark default user {{sparkSaslUser}} can manage all the applications > h3.User facing changes when running application > h4.General principles: > - conf: {{spark.authenticate.secret}} is *never sent* over the wire > - env: {{SPARK_AUTH_SECRET}} is *never sent* over the wire > - In all situations env variable will overwrite conf variable if present. > - In all situations when a user has to pass a secret, it is better (safer) to > do this through env variable > - In work modes with multiple secrets we assume encrypted communication > between client and master, between driver and master, between master and > workers > ---- > h4.Work modes and descriptions > h5.Client mode, single secret > h6.Configuration > - env: {{SPARK_AUTH_SECRET=secret}} or conf: > {{spark.authenticate.secret=secret}} > h6.Description > - The driver is running locally > - The driver will neither send env: {{SPARK_AUTH_SECRET}} nor conf: > {{spark.authenticate.secret}} > - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf: > {{spark.authenticate.secret}} for connection to the master > - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it > will look for it in the worker configuration and it will find it there (its > presence is implied). > ---- > h5.Client mode, multiple secrets > h6.Configuration > - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf: > {{spark.app.authenticate.secret=secret}} > - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf: > {{spark.submission.authenticate.secret=scheduler_secret}} > h6.Description > - The driver is running locally > - The driver will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: > {{spark.submission.authenticate.secret}} to connect to the master > - The driver will neither send env: {{SPARK_SUBMISSION_AUTH_SECRET}} nor > conf: {{spark.submission.authenticate.secret}} > - The driver will use either {{SPARK_APP_AUTH_SECRET}} or conf: > {{spark.app.authenticate.secret}} for communication with the executors > - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so > that the executors can use it to communicate with the driver > - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it > will set it in env: {{SPARK_AUTH_SECRET}} which will be read by > _ExecutorBackend_ afterwards and used for all the connections (with driver, > other executors and external shuffle service). > ---- > h5.Cluster mode, single secret > h6.Configuration > - env: {{SPARK_AUTH_SECRET=secret}} or conf: > {{spark.authenticate.secret=secret}} > h6.Description > - The driver is run by _DriverRunner_ which is is a part of the worker > - The client will neither send env: {{SPARK_AUTH_SECRET}} nor conf: > {{spark.authenticate.secret}} > - The client will use either env: {{SPARK_AUTH_SECRET}} or conf: > {{spark.authenticate.secret}} for connection to the master and submit the > driver > - _DriverRunner_ will not find any secret in _DriverDescription_ so it will > look for it in the worker configuration and it will find it there (its > presence is implied) > - _DriverRunner_ will set the secret it found in env: {{SPARK_AUTH_SECRET}} > so that the driver will find it and use it for all the connections > - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf: > {{spark.authenticate.secret}} for connection to the master > - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it > will look for it in the worker configuration and it will find it there (its > presence is implied). > ---- > h5.Cluster mode, multiple secrets > h6.Configuration > - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf: > {{spark.app.authenticate.secret=secret}} > - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf: > {{spark.submission.authenticate.secret=scheduler_secret}} > h6.Description > - The driver is run by _DriverRunner_ which is is a part of the worker > - The client will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: > {{spark.submission.authenticate.secret}} to connect to the master > - The client will send either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: > {{spark.submission.authenticate.secret}} as env: > {{SPARK_SUBMISSION_AUTH_SECRET}} (to avoid passing secret as Java command > line option) > - The client will send either env: {{SPARK_APP_AUTH_SECRET}} or conf: > {{spark.app.authenticate.secret}} as env: {{SPARK_APP_AUTH_SECRET}} (to avoid > passing secret as Java command line option) > - _DriverRunner_ will find env: {{SPARK_SUBMISSION_AUTH_SECRET}} and env: > {{SPARK_APP_AUTH_SECRET}} and will pass them both to the driver > - The driver will use env: {{SPARK_SUBMISSION_AUTH_SECRET}} > - The driver will not send env: {{SPARK_SUBMISSION_AUTH_SECRET}} > - The driver will use {{SPARK_APP_AUTH_SECRET}} for communication with the > executors > - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so > that the executors can use it to communicate with the driver > - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it > will set it in env: {{SPARK_AUTH_SECRET}} which will be read by > _ExecutorBackend_ afterwards and used for all the connections (with driver, > other executors and external shuffle service). > ---- > h4.Lifecycles > - env: {{SPARK_AUTH_SECRET}} and conf: {{spark.authenticate.secret}} are > always lost, they are never transferred to other entities. They are just used > in the entity which has them defined and die. > - env: {{SPARK_SUBMISSION_AUTH_SECRET}} is used by _Client_ to connect to the > master. It is sent as env variable of the same name with _DriverDescription_ > so that it is also present in the environment of the driver. Driver uses it > to connect to the master and it will not send it to any other entity. > - conf: {{spark.submission.authenticate.secret}} is used by _Client_ to > connect to the master unless env: {{SPARK_SUBMISSION_AUTH_SECRET}} is > defined. If env: {{SPARK_SUBMISSION_AUTH_SECRET}} is not defined, conf: > {{spark.submission.authenticate.secret}} is copied to env in > _DriverDescription_ as {{SPARK_SUBMISSION_AUTH_SECRET}} and removed from conf > to avoid passing it as Java command line argument when running the driver. > - env: {{SPARK_APP_AUTH_SECRET}} is sent as env variable of the same name > with _DriverDescription_ so that it is also present in the environment of the > driver. Driver uses it to connect to the executors and it will send it with > _ApplicationDescription_ as env: {{SPARK_AUTH_SECRET}} so that > _ExecutorRunner_ can put it into the executor environment. Then > _ExecutorBackend_ can use it to communicate with the driver, other executors > and external shuffle service. > - conf: {{spark.app.authenticate.secret}} - if env: {{SPARK_APP_AUTH_SECRET}} > is not defined, conf: {{spark.app.authenticate.secret}} is copied to env in > _DriverDescription_ as {{SPARK_APP_AUTH_SECRET}} and removed from conf to > avoid passing it as Java command line argument when running the driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org