ferruzzi commented on issue #38822: URL: https://github.com/apache/airflow/issues/38822#issuecomment-2133996216
Looking at the API docs [[here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_processing_job.html)], there isn't any way to set "distributed mode". But it does have this: ``` S3DataDistributionType (string) – Whether to distribute the data from Amazon S3 to all processing instances with FullyReplicated, or whether the data from Amazon S3 is shared by Amazon S3 key, downloading one shard of data to each processing instance. ``` so setting either `ProcessingInputs["S3Input"]["S3DataDistributionType"]` or `ProcessingInputs["DatasetDefinition"]["DataDistributionType"]` to "ShardedByS3Key" in the config may get the result you are looking for? But you aren't using any ProcessingInputs in the config at all, so I'm not sure how this works. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
