[
https://issues.apache.org/jira/browse/SPARK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacek Lewandowski updated SPARK-11326:
--------------------------------------
Summary: Support for authentication and encryption in standalone mode
(was: Split networking in standalone mode)
> Support for authentication and encryption in standalone mode
> ------------------------------------------------------------
>
> Key: SPARK-11326
> URL: https://issues.apache.org/jira/browse/SPARK-11326
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Jacek Lewandowski
>
> h3.The idea
> Currently, in standalone mode, all components, for all network connections
> need to use the same secure token if they want to have any security ensured.
> This ticket is intended to split the communication in standalone mode to make
> it more like in Yarn mode - application internal communication, scheduler
> internal communication and communication between the client and scheduler.
> Such refactoring will allow for the scheduler (master, workers) to use a
> distinct secret, which will remain unknown for the users. Similarly, it will
> allow for better security in applications, because each application will be
> able to use a distinct secret as well.
> By providing Kerberos based SASL authentication/encryption for connections
> between a client (Client or AppClient) and Spark Master, it will be possible
> to introduce authentication and automatic generation of digest tokens and
> safe sharing them among the application processes.
> h3.User facing changes when running application
> h4.General principles:
> - conf: {{spark.authenticate.secret}} is *never sent* over the wire
> - env: {{SPARK_AUTH_SECRET}} is *never sent* over the wire
> - In all situations env variable will overwrite conf variable if present.
> - In all situations when a user has to pass secret, it is better (safer) to
> do this through env variable
> - In work modes with multiple secrets we assume encrypted communication
> between client and master, between driver and master, between master and
> workers
> ----
> h4.Work modes and descriptions
> h5.Client mode, single secret
> h6.Configuration
> - env: {{SPARK_AUTH_SECRET=secret}} or conf:
> {{spark.authenticate.secret=secret}}
> h6.Description
> - The driver is running locally
> - The driver will neither send env: {{SPARK_AUTH_SECRET}} nor conf:
> {{spark.authenticate.secret}}
> - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf:
> {{spark.authenticate.secret}} for connection to the master
> - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it
> will look for it in the worker configuration and it will find it there (its
> presence is implied).
> ----
> h5.Client mode, multiple secrets
> h6.Configuration
> - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf:
> {{spark.app.authenticate.secret=secret}}
> - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf:
> {{spark.submission.authenticate.secret=scheduler_secret}}
> h6.Description
> - The driver is running locally
> - The driver will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf:
> {{spark.submission.authenticate.secret}} to connect to the master
> - The driver will neither send env: {{SPARK_SUBMISSION_AUTH_SECRET}} nor
> conf: {{spark.submission.authenticate.secret}}
> - The driver will use either {{SPARK_APP_AUTH_SECRET}} or conf:
> {{spark.app.authenticate.secret}} for communication with the executors
> - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so
> that the executors can use it to communicate with the driver
> - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it
> will set it in env: {{SPARK_AUTH_SECRET}} which will be read by
> _ExecutorBackend_ afterwards and used for all the connections (with driver,
> other executors and external shuffle service).
> ----
> h5.Cluster mode, single secret
> h6.Configuration
> - env: {{SPARK_AUTH_SECRET=secret}} or conf:
> {{spark.authenticate.secret=secret}}
> h6.Description
> - The driver is run by _DriverRunner_ which is is a part of the worker
> - The client will neither send env: {{SPARK_AUTH_SECRET}} nor conf:
> {{spark.authenticate.secret}}
> - The client will use either env: {{SPARK_AUTH_SECRET}} or conf:
> {{spark.authenticate.secret}} for connection to the master and submit the
> driver
> - _DriverRunner_ will not find any secret in _DriverDescription_ so it will
> look for it in the worker configuration and it will find it there (its
> presence is implied)
> - _DriverRunner_ will set the secret it found in env: {{SPARK_AUTH_SECRET}}
> so that the driver will find it and use it for all the connections
> - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf:
> {{spark.authenticate.secret}} for connection to the master
> - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it
> will look for it in the worker configuration and it will find it there (its
> presence is implied).
> ----
> h5.Cluster mode, multiple secrets
> h6.Configuration
> - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf:
> {{spark.app.authenticate.secret=secret}}
> - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf:
> {{spark.submission.authenticate.secret=scheduler_secret}}
> h6.Description
> - The driver is run by _DriverRunner_ which is is a part of the worker
> - The client will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf:
> {{spark.submission.authenticate.secret}} to connect to the master
> - The client will send either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf:
> {{spark.submission.authenticate.secret}} as env:
> {{SPARK_SUBMISSION_AUTH_SECRET}} (to avoid passing secret as Java command
> line option)
> - The client will send either env: {{SPARK_APP_AUTH_SECRET}} or conf:
> {{spark.app.authenticate.secret}} as env: {{SPARK_APP_AUTH_SECRET}} (to avoid
> passing secret as Java command line option)
> - _DriverRunner_ will find env: {{SPARK_SUBMISSION_AUTH_SECRET}} and env:
> {{SPARK_APP_AUTH_SECRET}} and will pass them both to the driver
> - The driver will use env: {{SPARK_SUBMISSION_AUTH_SECRET}}
> - The driver will not send env: {{SPARK_SUBMISSION_AUTH_SECRET}}
> - The driver will use {{SPARK_APP_AUTH_SECRET}} for communication with the
> executors
> - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so
> that the executors can use it to communicate with the driver
> - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it
> will set it in env: {{SPARK_AUTH_SECRET}} which will be read by
> _ExecutorBackend_ afterwards and used for all the connections (with driver,
> other executors and external shuffle service).
> ----
> h4.Lifecycles
> - env: {{SPARK_AUTH_SECRET}} and conf: {{spark.authenticate.secret}} are
> always lost, they are never transferred to other entities. They are just used
> in the entity which has them defined and die.
> - env: {{SPARK_SUBMISSION_AUTH_SECRET}} is used by _Client_ to connect to the
> master. It is sent as env variable of the same name with _DriverDescription_
> so that it is also present in the environment of the driver. Driver uses it
> to connect to the master and it will not send it to any other entity.
> - conf: {{spark.submission.authenticate.secret}} is used by _Client_ to
> connect to the master unless env: {{SPARK_SUBMISSION_AUTH_SECRET}} is
> defined. If env: {{SPARK_SUBMISSION_AUTH_SECRET}} is not defined, conf:
> {{spark.submission.authenticate.secret}} is copied to env in
> _DriverDescription_ as {{SPARK_SUBMISSION_AUTH_SECRET}} and removed from conf
> to avoid passing it as Java command line argument when running the driver.
> - env: {{SPARK_APP_AUTH_SECRET}} is sent as env variable of the same name
> with _DriverDescription_ so that it is also present in the environment of the
> driver. Driver uses it to connect to the executors and it will send it with
> _ApplicationDescription_ as env: {{SPARK_AUTH_SECRET}} so that
> _ExecutorRunner_ can put it into the executor environment. Then
> _ExecutorBackend_ can use it to communicate with the driver, other executors
> and external shuffle service.
> - conf: {{spark.app.authenticate.secret}} - if env: {{SPARK_APP_AUTH_SECRET}}
> is not defined, conf: {{spark.app.authenticate.secret}} is copied to env in
> _DriverDescription_ as {{SPARK_APP_AUTH_SECRET}} and removed from conf to
> avoid passing it as Java command line argument when running the driver.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]