[
https://issues.apache.org/jira/browse/SPARK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021848#comment-15021848
]
Jacek Lewandowski edited comment on SPARK-11326 at 11/23/15 9:41 AM:
---------------------------------------------------------------------
[~pwendell] - are you (DB) interested in reviewing this patch at all?
was (Author: jlewandowski):
[~pwendell] - are you interested in reviewing this patch at all?
> Support for authentication and encryption in standalone mode
> ------------------------------------------------------------
>
> Key: SPARK-11326
> URL: https://issues.apache.org/jira/browse/SPARK-11326
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Jacek Lewandowski
>
> h3.The idea
> Currently, in standalone mode, all components, for all network connections
> need to use the same secure token if they want to have any security ensured.
> This ticket is intended to split the communication in standalone mode to make
> it more like in Yarn mode - application internal communication and scheduler
> communication.
> Such refactoring will allow for the scheduler (master, workers) to use a
> distinct secret, which will remain unknown for the users. Similarly, it will
> allow for better security in applications, because each application will be
> able to use a distinct secret as well.
> By providing SASL authentication/encryption for connections between a client
> (Client or AppClient) and Spark Master, it becomes possible introducing
> pluggable authentication for standalone deployment mode.
> h3.Improvements introduced by this patch
> This patch introduces the following changes:
> * Spark driver or submission client do not have to use the same secret as
> workers use to communicate with Master
> * Master is able to authenticate individual clients with the following rules:
> ** When connecting to the master, the client needs to specify
> {{spark.authenticate.secret}} which is an authentication token for the user
> specified by {{spark.authenticate.user}} ({{sparkSaslUser}} by default)
> ** Master configuration may include additional
> {{spark.authenticate.secrets.<username>}} entries for specifying
> authentication token for particular users or
> {{spark.authenticate.authenticatorClass}} which specify an implementation of
> external credentials provider (which is able to retrieve the authentication
> token for a given user).
> ** Workers authenticate with Master as default user {{sparkSaslUser}}.
> * The authorization rules are as follows:
> ** A regular user is able to manage only his own application (the application
> which he submitted)
> ** A regular user is not able to register or manager workers
> ** Spark default user {{sparkSaslUser}} can manage all the applications
> h3.User facing changes when running application
> h4.General principles:
> - conf: {{spark.authenticate.secret}} is *never sent* over the wire
> - env: {{SPARK_AUTH_SECRET}} is *never sent* over the wire
> - In all situations env variable will overwrite conf variable if present.
> - In all situations when a user has to pass a secret, it is better (safer) to
> do this through env variable
> - In work modes with multiple secrets we assume encrypted communication
> between client and master, between driver and master, between master and
> workers
> ----
> h4.Work modes and descriptions
> h5.Client mode, single secret
> h6.Configuration
> - env: {{SPARK_AUTH_SECRET=secret}} or conf:
> {{spark.authenticate.secret=secret}}
> h6.Description
> - The driver is running locally
> - The driver will neither send env: {{SPARK_AUTH_SECRET}} nor conf:
> {{spark.authenticate.secret}}
> - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf:
> {{spark.authenticate.secret}} for connection to the master
> - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it
> will look for it in the worker configuration and it will find it there (its
> presence is implied).
> ----
> h5.Client mode, multiple secrets
> h6.Configuration
> - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf:
> {{spark.app.authenticate.secret=secret}}
> - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf:
> {{spark.submission.authenticate.secret=scheduler_secret}}
> h6.Description
> - The driver is running locally
> - The driver will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf:
> {{spark.submission.authenticate.secret}} to connect to the master
> - The driver will neither send env: {{SPARK_SUBMISSION_AUTH_SECRET}} nor
> conf: {{spark.submission.authenticate.secret}}
> - The driver will use either {{SPARK_APP_AUTH_SECRET}} or conf:
> {{spark.app.authenticate.secret}} for communication with the executors
> - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so
> that the executors can use it to communicate with the driver
> - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it
> will set it in env: {{SPARK_AUTH_SECRET}} which will be read by
> _ExecutorBackend_ afterwards and used for all the connections (with driver,
> other executors and external shuffle service).
> ----
> h5.Cluster mode, single secret
> h6.Configuration
> - env: {{SPARK_AUTH_SECRET=secret}} or conf:
> {{spark.authenticate.secret=secret}}
> h6.Description
> - The driver is run by _DriverRunner_ which is is a part of the worker
> - The client will neither send env: {{SPARK_AUTH_SECRET}} nor conf:
> {{spark.authenticate.secret}}
> - The client will use either env: {{SPARK_AUTH_SECRET}} or conf:
> {{spark.authenticate.secret}} for connection to the master and submit the
> driver
> - _DriverRunner_ will not find any secret in _DriverDescription_ so it will
> look for it in the worker configuration and it will find it there (its
> presence is implied)
> - _DriverRunner_ will set the secret it found in env: {{SPARK_AUTH_SECRET}}
> so that the driver will find it and use it for all the connections
> - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf:
> {{spark.authenticate.secret}} for connection to the master
> - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it
> will look for it in the worker configuration and it will find it there (its
> presence is implied).
> ----
> h5.Cluster mode, multiple secrets
> h6.Configuration
> - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf:
> {{spark.app.authenticate.secret=secret}}
> - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf:
> {{spark.submission.authenticate.secret=scheduler_secret}}
> h6.Description
> - The driver is run by _DriverRunner_ which is is a part of the worker
> - The client will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf:
> {{spark.submission.authenticate.secret}} to connect to the master
> - The client will send either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf:
> {{spark.submission.authenticate.secret}} as env:
> {{SPARK_SUBMISSION_AUTH_SECRET}} (to avoid passing secret as Java command
> line option)
> - The client will send either env: {{SPARK_APP_AUTH_SECRET}} or conf:
> {{spark.app.authenticate.secret}} as env: {{SPARK_APP_AUTH_SECRET}} (to avoid
> passing secret as Java command line option)
> - _DriverRunner_ will find env: {{SPARK_SUBMISSION_AUTH_SECRET}} and env:
> {{SPARK_APP_AUTH_SECRET}} and will pass them both to the driver
> - The driver will use env: {{SPARK_SUBMISSION_AUTH_SECRET}}
> - The driver will not send env: {{SPARK_SUBMISSION_AUTH_SECRET}}
> - The driver will use {{SPARK_APP_AUTH_SECRET}} for communication with the
> executors
> - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so
> that the executors can use it to communicate with the driver
> - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it
> will set it in env: {{SPARK_AUTH_SECRET}} which will be read by
> _ExecutorBackend_ afterwards and used for all the connections (with driver,
> other executors and external shuffle service).
> ----
> h4.Lifecycles
> - env: {{SPARK_AUTH_SECRET}} and conf: {{spark.authenticate.secret}} are
> always lost, they are never transferred to other entities. They are just used
> in the entity which has them defined and die.
> - env: {{SPARK_SUBMISSION_AUTH_SECRET}} is used by _Client_ to connect to the
> master. It is sent as env variable of the same name with _DriverDescription_
> so that it is also present in the environment of the driver. Driver uses it
> to connect to the master and it will not send it to any other entity.
> - conf: {{spark.submission.authenticate.secret}} is used by _Client_ to
> connect to the master unless env: {{SPARK_SUBMISSION_AUTH_SECRET}} is
> defined. If env: {{SPARK_SUBMISSION_AUTH_SECRET}} is not defined, conf:
> {{spark.submission.authenticate.secret}} is copied to env in
> _DriverDescription_ as {{SPARK_SUBMISSION_AUTH_SECRET}} and removed from conf
> to avoid passing it as Java command line argument when running the driver.
> - env: {{SPARK_APP_AUTH_SECRET}} is sent as env variable of the same name
> with _DriverDescription_ so that it is also present in the environment of the
> driver. Driver uses it to connect to the executors and it will send it with
> _ApplicationDescription_ as env: {{SPARK_AUTH_SECRET}} so that
> _ExecutorRunner_ can put it into the executor environment. Then
> _ExecutorBackend_ can use it to communicate with the driver, other executors
> and external shuffle service.
> - conf: {{spark.app.authenticate.secret}} - if env: {{SPARK_APP_AUTH_SECRET}}
> is not defined, conf: {{spark.app.authenticate.secret}} is copied to env in
> _DriverDescription_ as {{SPARK_APP_AUTH_SECRET}} and removed from conf to
> avoid passing it as Java command line argument when running the driver.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]