[I] amazon provider converts values to int when the tuning operator expect it as string [airflow]

via GitHub Thu, 31 Oct 2024 09:42:05 -0700


francesco-camussoni-ueno opened a new issue, #43552:
URL: https://github.com/apache/airflow/issues/43552


   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   8.2.0
   
   ### Apache Airflow version
   
   2.6.3
   
   ### Operating System
   
   mw1.small
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   I have this task related to a tuning job on a dag:
   
   `tuning_dict = {"task_id": "tuning", "config": 
{"HyperParameterTuningJobConfig": {"ParameterRanges": 
{"CategoricalParameterRanges": [{"Name": "max_features", "Values": ["sqrt", 
"log2"]}, {"Name": "criterion", "Values": ["gini", "entropy", "log_loss"]}], 
"ContinuousParameterRanges": [{"Name": "ccp_alpha", "MinValue": "0.0", 
"MaxValue": "0.02"}], "IntegerParameterRanges": [{"Name": "min_samples_leaf", 
"MinValue": "2", "MaxValue": "15"}, {"Name": "n_estimators", "MinValue": "50", 
"MaxValue": "500"}]}, "HyperParameterTuningJobObjective": {"Name": 
"validation:accuracy", "Type": "Maximize"}, "Strategy": "Bayesian", 
"RandomSeed": 123}, ...`
       
   The key ContinuousParameterRanges contains some hyperparameters for mi 
tunning job that are casted as a string. This is a must based on the 
TuningOperator: 
https://github.com/apache/airflow/blob/providers-amazon/3.4.0/airflow/providers/amazon/aws/example_dags/example_sagemaker.py
 (line 202).
   
   But I'm seeing that they are converted to float in the case of 
ContinuousParameterRanges or to int in the case of IntegerParameterRanges 
because of this bunch of code: 
https://github.com/apache/airflow/blob/providers-amazon/8.20.0/airflow/providers/amazon/aws/operators/sagemaker.py
 (line 99 or function parse_config_integers/parse_integers)
   
   So when I execute the dag I get this kind of erros: 
   
   ```
   Invalid type for parameter 
HyperParameterTuningJobConfig.ParameterRanges.ContinuousParameterRanges[0].MinValue,
 value: 0.0, type: <class 'float'>, valid types: <class 'str'>
   Invalid type for parameter 
HyperParameterTuningJobConfig.ParameterRanges.ContinuousParameterRanges[0].MaxValue,
 value: 0.02, type: <class 'float'>, valid types: <class 'str'>
   Invalid type for parameter 
HyperParameterTuningJobConfig.ParameterRanges.IntegerParameterRanges[0].MinValue,
 value: 2, type: <class 'int'>, valid types: <class 'str'>
   Invalid type for parameter 
HyperParameterTuningJobConfig.ParameterRanges.IntegerParameterRanges[0].MaxValue,
 value: 15, type: <class 'int'>, valid types: <class 'str'>
   Invalid type for parameter 
HyperParameterTuningJobConfig.ParameterRanges.IntegerParameterRanges[1].MinValue,
 value: 50, type: <class 'int'>, valid types: <class 'str'>
   Invalid type for parameter 
HyperParameterTuningJobConfig.ParameterRanges.IntegerParameterRanges[1].MaxValue,
 value: 500, type: <class 'int'>, valid types: <class 'str'>
   ```
   
   Any help?
   
   
   ### What you think should happen instead
   
   I think that those parameters don't have to be converted as float or string
   
   ### How to reproduce
   
   Generate a dag with this task:
   
   'tuning_dict = {"task_id": "tuning", "config": 
{"HyperParameterTuningJobConfig": {"ParameterRanges": 
{"CategoricalParameterRanges": [{"Name": "max_features", "Values": ["sqrt", 
"log2"]}, {"Name": "criterion", "Values": ["gini", "entropy", "log_loss"]}], 
"ContinuousParameterRanges": [{"Name": "ccp_alpha", "MinValue": "0.0", 
"MaxValue": "0.02"}], "IntegerParameterRanges": [{"Name": "min_samples_leaf", 
"MinValue": "2", "MaxValue": "15"}, {"Name": "n_estimators", "MinValue": "50", 
"MaxValue": "500"}]}, "HyperParameterTuningJobObjective": {"Name": 
"validation:accuracy", "Type": "Maximize"}, "Strategy": "Bayesian", 
"RandomSeed": 123}, "ResourceLimits": {"MaxNumberOfTrainingJobs": 10, 
"MaxParallelTrainingJobs": 4, "MaxRuntimeInSeconds": 7200}, "Tags": [{"Key": 
"USER", "Value": "[email protected]"}, {"Key": "TRIBU", "Value": 
"Central Data"}, {"Key": "SQUAD", "Value": "Personalization and Relevance"}, 
{"Key": "ONLINE_OR_BATCH", "Value": "batch"}, {"Key": "PREDICTION_TYPE", "Va
 lue": "clasificacion binaria"}, {"Key": "VERSION_DESCRIPTION", "Value": 
"Version inicial"}, {"Key": "DESCRIPTION", "Value": "Desarrollo de deployment 
de pipeline de entrenamiento"}], "HyperParameterTuningJobName": 
"mlpipeline-training-tuning", "TrainingJobDefinition": 
{"AlgorithmSpecification": {"TrainingImage": "<Training_image>", 
"TrainingInputMode": "File", "MetricDefinitions": [{"Name": 
"validation:accuracy", "Regex": "validation-accuracy=(.*?);"}, {"Name": 
"validation:recall", "Regex": "validation-recall=(.*?);"}, {"Name": 
"validation:precision", "Regex": "validation-precision=(.*?);"}]}, 
"InputDataConfig": [{"ChannelName": "ingestion", "DataSource": {"S3DataSource": 
{"S3DataType": "S3Prefix", "S3Uri": "<BUCKET>", "S3DataDistributionType": 
"FullyReplicated"}}}], "OutputDataConfig": {"S3OutputPath": 
"s3://pr-ueno-prod-sagemaker/ml-projects/mlpipeline/training_pipeline/tuning/output"},
 "ResourceConfig": {"InstanceType": "ml.m5.large", "InstanceCount": 1, 
"VolumeSizeInGB": 10}, "S
 toppingCondition": {"MaxRuntimeInSeconds": 7200}, "RoleArn": "<ROLE>", 
"StaticHyperParameters": {}}}}`
   
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] amazon provider converts values to int when the tuning operator expect it as string [airflow]

Reply via email to