Parth Chandra created SPARK-38954:
-------------------------------------

             Summary: Implement sharing of cloud credentials among driver and 
executors
                 Key: SPARK-38954
                 URL: https://issues.apache.org/jira/browse/SPARK-38954
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.2.1
            Reporter: Parth Chandra


Currently Spark uses external implementations (e.g. hadoop-aws) to access cloud 
services like S3. In order to access the actual service, these implementations 
use credentials provider implementations that obtain credentials to allow 
access to the cloud service.
These credentials are typically session credentials, which means that they 
expire after a fixed time. Sometimes, this expiry can be only an hour and for a 
spark job that runs for many hours (or spark streaming job that runs 
continuously), the credentials have to be renewed periodically.
In many organizations, the process of getting credentials may multi-step. The 
organization has an identity provider service that provides authentication for 
the user, while the cloud service provider provides authorization for the roles 
the user has access to. Once the user is authenticated and her role verified, 
the credentials are generated for a new session.
In a large setup with hundreds of Spark jobs and thousands of executors, each 
executor is then spending a lot of time getting credentials and this may put 
unnecessary load on the backend authentication services.
The alleviate this, we can use Spark's architecture to obtain the credentials 
once in the driver and push the credentials to the executors. In addition, the 
driver can check the expiry of the credentials and push updated credentials to 
the executors. This is relatively easy to do since the rpc mechanism to 
implement this is already in place and is used similarly for Kerberos 
delegation tokens.
  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to