maytasm3 opened a new issue #9305: Add support for optional cloud (aws, gcs, 
etc.) credentials for s3 for ingestion
URL: https://github.com/apache/druid/issues/9305
 
 
   ### Description
   Cloud InputSource (such as S3, etc.) should be able to support taking in 
credentials optionally in order to be able to read from multiple 
buckets/location. This is in order to use different credentials than the 
default (i.e. the one provided in the common runtime property) and also to be 
able to use different credentials for each ingestion.
   
   ### Motivation
   The source of raw data (input data) can be stored in a different 
bucket/location which have different credentials than the deep storage 
location. For example, we might want to consume from s3 and also use s3 as a 
deep storage, however, the raw data s3 and deep storage s3 are different 
buckets with different credentials. 
   We may also want to ingest from multiple bucket/location. For example, we 
might want to consume from multiple s3 buckets each with its own credentials. 
We should not have to go and change the `druid.s3.accesskey` to read raw data 
from different bucket for every ingestion. 
   
   ### Proposed changes
   Adding two new fields to the inputSource spec. These fields are 
overrideAccessKeyId and overrideSecretAccessKey 
   
   These fields are optional. 
   ```
   ...
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "s3",
           "prefixes": ["s3://foo/bar", "s3://bar/foo"]
           "overrideAccessKeyId":"123455"
           "overrideSecretAccessKey":"masdjlaksjdlakjsd"
         },
         "inputFormat": {
           "type": "json"
         },
         ...
       },
   ...
   ```
   If both these fields are not present then the normal cloud 
client/configuration will be used (current behavior). If one field present but 
not both then the task will failed. If both fields are given, a new client will 
be constructed for this ingestion task. The new client will use the access key 
and secret key given with the rest of the other configurations (i.e. region, 
etc) from the one currently used. 
   
   For example, we will create a new s3 client for inputSource type s3 with 
overrideAccessKeyId and overrideSecretAccessKey given with something similar to 
below:
   ```
   BasicAWSCredentials awsCreds = new BasicAWSCredentials("access_key_id", 
"secret_key_id");
   AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
                           .withCredentials(new 
AWSStaticCredentialsProvider(awsCreds))
                           .build();
   ```
   This new client will only be use for reading the input data. Deep storage 
will still be handle with the default client/configurations. 
   
   The fields overrideAccessKeyId and overrideSecretAccessKey can also be 
encrypted. Encrypt/decrypt can be done using a pre-set ENV configuration. User 
will have to encrypt it prior to putting it in the ingestion spec. Druid will 
use the pre-set Env configuration to decrypt on ingestion. Hence, the fields in 
the ingestionSpec will contains the encrypted keys. (We can also use the 
PasswordProvider for this)
   
   ### Rationale
   Making ingestion from cloud inputSource more flexible to use.
   
   ### Operational impact
   - Creating new cloud client for every ingestion that uses this new fields
   - No concern with backward compatibility since these new fields are 
optional. 
   
   ### Test plan
   - New integration tests will be added.
   - Will test in a real Druid cluster.
   
   ### Future work
   - The "Multi" input source. This is to support ingestion from multiple 
different cloud credentials (such as multiple s3 buckets or mixture of s3 
buckets and gcs buckets where all of them have different credentials) in the 
same ingestionSpec. 
   - Extra optional field in ingestionSpec to takes in a path to file that 
contains the credentials instread of using the fields overrideAccessKeyId and 
overrideSecretAccessKey. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to