ferruzzi commented on issue #38822:
URL: https://github.com/apache/airflow/issues/38822#issuecomment-2133996216

   Looking at the API docs 
[[here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_processing_job.html)],
 there isn't any way to set "distributed mode".
   
   But it does have this:
   
   ```
   S3DataDistributionType (string) –
   
   Whether to distribute the data from Amazon S3 to all processing instances 
with FullyReplicated, or whether the data from Amazon S3 is shared by Amazon S3 
key, downloading one shard of data to each processing instance.
   ```
   
   so setting either `ProcessingInputs["S3Input"]["S3DataDistributionType"]` or 
`ProcessingInputs["DatasetDefinition"]["DataDistributionType"]` to 
"ShardedByS3Key" in the config may get the result you are looking for?  But you 
aren't using any ProcessingInputs in the config at all, so I'm not sure how 
this works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to