Parth Chandra created SPARK-38954:
-------------------------------------
Summary: Implement sharing of cloud credentials among driver and
executors
Key: SPARK-38954
URL: https://issues.apache.org/jira/browse/SPARK-38954
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.2.1
Reporter: Parth Chandra
Currently Spark uses external implementations (e.g. hadoop-aws) to access cloud
services like S3. In order to access the actual service, these implementations
use credentials provider implementations that obtain credentials to allow
access to the cloud service.
These credentials are typically session credentials, which means that they
expire after a fixed time. Sometimes, this expiry can be only an hour and for a
spark job that runs for many hours (or spark streaming job that runs
continuously), the credentials have to be renewed periodically.
In many organizations, the process of getting credentials may multi-step. The
organization has an identity provider service that provides authentication for
the user, while the cloud service provider provides authorization for the roles
the user has access to. Once the user is authenticated and her role verified,
the credentials are generated for a new session.
In a large setup with hundreds of Spark jobs and thousands of executors, each
executor is then spending a lot of time getting credentials and this may put
unnecessary load on the backend authentication services.
The alleviate this, we can use Spark's architecture to obtain the credentials
once in the driver and push the credentials to the executors. In addition, the
driver can check the expiry of the credentials and push updated credentials to
the executors. This is relatively easy to do since the rpc mechanism to
implement this is already in place and is used similarly for Kerberos
delegation tokens.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]