[
https://issues.apache.org/jira/browse/CASSSIDECAR-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jon Haddad updated CASSSIDECAR-415:
-----------------------------------
Status: Ready to Commit (was: Review In Progress)
> Support IAM instance profile credentials for S3 restore job downloads
> ---------------------------------------------------------------------
>
> Key: CASSSIDECAR-415
> URL: https://issues.apache.org/jira/browse/CASSSIDECAR-415
> Project: Sidecar for Apache Cassandra
> Issue Type: Improvement
> Components: Bulk Analytics
> Reporter: Jon Haddad
> Assignee: Jon Haddad
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> The restore job feature downloads SSTables from S3 using static AWS
> credentials that the caller must supply via POST
> /api/v1/\{keyspace}/\{table}/restore-jobs. The request body must include a
> secrets object (RestoreJobSecrets) containing separate read and write
> StorageCredentials, each requiring accessKeyId, secretAccessKey,
> sessionToken, and region — all enforced as non-null in
> StorageCredentials.java (lines 52–55) and RestoreJobSecrets.java (lines
> 41–42).
> On job creation, RestoreJobDatabaseAccessor.create() (line 90) serializes the
> secrets to JSON via Jackson and writes them as a raw blob to the blob_secrets
> column of the restore_jobs table, defined in RestoreJobsSchema.java (line
> 91). There is no encryption applied at the column, table, or application
> level — the credentials are stored as plaintext JSON bytes. This leaks the
> credentials to anyone with access to this table.
> Because multiple sidecar nodes process different slices of the same restore
> job in parallel, each node reads the job back from Cassandra — including the
> secrets — via
> RestoreJobDatabaseAccessor.find() (line 191), which deserializes them from
> row.getBytes("blob_secrets") in {{RestoreJob.java}} (line 94). Each node then
> passes the job to StorageClientPool.storageClient() (line 86), which extracts
> the region from {{restoreJob.secrets.readCredentials().region()}} (line 88)
> and calls StorageClient.authenticate(). Inside
> {{StorageClient.Credentials.init()}} (lines 341–344), the credentials are
> unconditionally converted to AwsSessionCredentials and wrapped in a
> StaticCredentialsProvider, which is then injected into each S3 request via
> overrideConfiguration(b -> b.credentialsProvider(...)) in both objectExists()
> (line 145) and rangeGetObject() (line 237).
> This design contradicts AWS best practices. AWS explicitly recommends using
> IAM roles over static credentials wherever possible. IAM roles — via EC2
> instance profiles, ECS task roles, or EKS IRSA — eliminate the need to
> create, distribute, store, rotate, or revoke long-lived credentials. The
> current design forces users running in AWS to work against this guidance:
> even if their nodes already have IAM-granted S3 access, they must still
> obtain and manage static credentials to satisfy the mandatory
> {{Objects.requireNonNull(secrets, ...)}} check in
> {{CreateRestoreJobRequestPayload.java}} (line 101).
> Passing static credentials over the request and storing them in Cassandra
> creates risk that IAM roles entirely avoid.
> RestoreJobDatabaseAccessor.create() (line 90) writes the secrets as a plain
> JSON blob into blob_secrets. The restore_jobs table schema
> (RestoreJobsSchema.java line 91) has no encryption configuration — no
> column-level encryption, no transparent data encryption, no application-level
> crypto. The credentials sit as plaintext, replicated across every Cassandra
> node holding that partition, and included in any Cassandra backups taken
> during the job's lifetime.
> Credentials visible in logs on failure. StorageClient logs
> credentials.readCredentials on S3 request failures in both
> logCredentialOnRequestFailure() (line 298) and the failure mapper in
> rangeGetObject() (line 256). Although StorageCredentials.toString() redacts
> the secret key and session token (line 94 of StorageCredentials.java), the
> access key ID is logged in plaintext. This provides an attack vector by
> giving an adversary a string to search for to potentially match a secret to.
> *Proposed Solution*
> Make secrets optional throughout the restore job pipeline. When secrets are
> absent, StorageClient should fall back to DefaultCredentialsProvider, which
> implements the standard AWS credential chain: environment variables → system
> properties → IAM instance profile → ECS task role → etc. This aligns the
> sidecar with AWS best practices and allows operators running in
> AWS to use the credential model AWS recommends.
> StorageCredentials, RestoreJobSecrets, and CreateRestoreJobRequestPayload
> need to permit null/absent credentials. The region must still be provided —
> either inside the secrets object or as
> a new top-level field on the request — since it is required by
> StorageClientPool.storageClient() (line 88) to construct the regional S3
> endpoint.
> StorageClient.Credentials.init() (lines 339–345) should branch: use
> StaticCredentialsProvider with AwsSessionCredentials when credentials are
> present, use
> DefaultCredentialsProvider.create() when they are not. The
> RestoreJobFatalException thrown when secrets are null (lines 331–334) should
> be removed.
> RestoreJobDatabaseAccessor.create() (line 90) should skip writing
> blob_secrets when secrets are null. RestoreJob.from() (line 94 of
> RestoreJob.java) already handles a null blob_secrets
> column gracefully.
> API backward compatibility: Fully backward-compatible. Callers that currently
> pass credentials continue to work unchanged.
> Acceptance Criteria
> * secrets is optional in POST /api/v1/\{keyspace}/\{table}/restore-jobs;
> existing clients with credentials continue to work unchanged
> * When secrets is absent, StorageClient uses DefaultCredentialsProvider
> * region is still required whether or not secrets are provided
> * When using IAM mode, nothing is written to the blob_secrets column
> * Integration test covering a restore job completing successfully without
> explicit credentials
> * Unit tests for StorageClient.Credentials covering both the static and IAM
> credential paths
> Key Files to Modify
> * client-common/.../common/data/StorageCredentials.java — make credential
> fields optional
> * client-common/.../common/data/RestoreJobSecrets.java — allow null
> read/write credentials
> * client-common/.../common/request/data/CreateRestoreJobRequestPayload.java
> — remove null check on secrets; handle region when secrets are absent
> * server/.../restore/StorageClient.java — branch on null credentials in
> Credentials.init(); use DefaultCredentialsProvider as fallback; remove fatal
> exception on null secrets
> * server/.../restore/StorageClientPool.java — handle null secrets when
> extracting region in storageClient()
> * server/.../db/RestoreJob.java — handle null secrets throughout
> * server/.../db/RestoreJobDatabaseAccessor.java — skip blob_secrets write
> when secrets are null
> * server/.../handlers/restore/CreateRestoreJobHandler.java — relax secrets
> validation
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]