jimmycfa edited a comment on issue #16763:
URL: https://github.com/apache/airflow/issues/16763#issuecomment-873010485
For posterity sake we are using boto3==1.17.99. This actually appears to be
an issue with the way that NameContains filter gets applied:
The NameContains is getting passed into list_processing_jobs but it doesn't
actually filter on the entire set of ProcessingJobs. It appears to filter per
batch of 100 so you still end up calling the list_processing_jobs in that
SagemakerOperator 30+ times back to back. Another way of saying this is if I
specify NameContains in the list_processing_jobs with a job name that doesn't
exist and I have over 3500 processing jobs it will return an empty set of
ProcessingJobSummaries BUT still includes a NextToken. It will do this 35 more
times as the max results = 100 for that call and you likely run into Throttling
issues.
I believe the expected behavior of that boto3 call should be the
NameContains filter should be being applied to the entire set of jobs and then
returning results vs per batch so that the first call through returns an empty
set for ProcessingJobSummaries and NO NextToken.
I'm going to reopen but this does appear to be a boto3 issue.
Our current workaround was to update the `aws_default` connection in
Admin->Connections and add the following to Extr:
```json
{
"config_kwargs":{
"retries":{
"max_attempts":10,
"mode":"standard"
}
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]