maytasm3 opened a new issue #9305: Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion URL: https://github.com/apache/druid/issues/9305 ### Description Cloud InputSource (such as S3, etc.) should be able to support taking in credentials optionally in order to be able to read from multiple buckets/location. This is in order to use different credentials than the default (i.e. the one provided in the common runtime property) and also to be able to use different credentials for each ingestion. ### Motivation The source of raw data (input data) can be stored in a different bucket/location which have different credentials than the deep storage location. For example, we might want to consume from s3 and also use s3 as a deep storage, however, the raw data s3 and deep storage s3 are different buckets with different credentials. We may also want to ingest from multiple bucket/location. For example, we might want to consume from multiple s3 buckets each with its own credentials. We should not have to go and change the `druid.s3.accesskey` to read raw data from different bucket for every ingestion. ### Proposed changes Adding two new fields to the inputSource spec. These fields are overrideAccessKeyId and overrideSecretAccessKey These fields are optional. ... "ioConfig": { "type": "index_parallel", "inputSource": { "type": "s3", "prefixes": ["s3://foo/bar", "s3://bar/foo"] "overrideAccessKeyId":"123455" "overrideSecretAccessKey":"masdjlaksjdlakjsd" }, "inputFormat": { "type": "json" }, ... }, ... If both these fields are not present then the normal cloud client/configuration will be used (current behavior). If one field present but not both then the task will failed. If both fields are given, a new client will be constructed for this ingestion task. The new client will use the access key and secret key given with the rest of the other configurations (i.e. region, etc) from the one currently used. For example, we will create a new s3 client for inputSource type s3 with overrideAccessKeyId and overrideSecretAccessKey given with something similar to below: BasicAWSCredentials awsCreds = new BasicAWSCredentials("access_key_id", "secret_key_id"); AmazonS3 s3Client = AmazonS3ClientBuilder.standard() .withCredentials(new AWSStaticCredentialsProvider(awsCreds)) .build(); This new client will only be use for reading the input data. Deep storage will still be handle with the default client/configurations. The fields overrideAccessKeyId and overrideSecretAccessKey can also be encrypted. Encrypt/decrypt can be done using a pre-set ENV configuration. User will have to encrypt it prior to putting it in the ingestion spec. Druid will use the pre-set Env configuration to decrypt on ingestion. Hence, the fields in the ingestionSpec will contains the encrypted keys. (We can also use the PasswordProvider for this) ### Rationale Making ingestion from cloud inputSource more flexible to use. ### Operational impact - Creating new cloud client for every ingestion that uses this new fields - No concern with backward compatibility since these new fields are optional. ### Test plan - New integration tests will be added. - Will test in a real Druid cluster. ### Future work - The "Multi" input source. This is to support ingestion from multiple different cloud credentials (such as multiple s3 buckets or mixture of s3 buckets and gcs buckets where all of them have different credentials) in the same ingestionSpec. - Extra optional field in ingestionSpec to takes in a path to file that contains the credentials instread of using the fields overrideAccessKeyId and overrideSecretAccessKey.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
