devinmnorris opened a new issue, #37400: URL: https://github.com/apache/airflow/issues/37400
### Apache Airflow Provider(s) google ### Versions of Apache Airflow Providers apache-airflow-providers-google==10.12.0 ### Apache Airflow version 2.6.3 ### Operating System Ubuntu 22.04.3 LTS ### Deployment Docker-Compose ### Deployment details _No response_ ### What happened When creating AutoML Text Training jobs using `CreateAutoMLTextTrainingJobOperator` and providing the resource name or model ID of an existing model to the `parent_model` parameter, an entirely new model with `Version 1` shows up in Vertex AI Model Registry. ### What you think should happen instead Since we provided an argument to `parent_model`, the model uploaded by the job should be a version of the existing _parent model_. <img width="941" alt="image" src="https://github.com/apache/airflow/assets/131682078/16c44fab-56f4-47cd-8c26-60d915932da5"> ### How to reproduce If your model registry already has an existing model to use as the parent model, skip to step 3. Otherwise: 1. Train the initial model 2. Get the initial model's resource name 3. Train a new model, specifying `parent_model=initial_model_resource_name` ```python def get_parent_model(project_id: str): from google.cloud import aiplatform aiplatform.init(project=project_id) models = [m for m in aiplatform.Model.list()] models.sort(key=lambda m: m.version_update_time, reverse=True) return models[0].resource_name with DAG as dag: initial_model = CreateAutoMLTextTrainingJobOperator( task_id="create_auto_ml_training_job-1", project_id=PROJECT_ID, region=REGION, display_name="automl-training-job-1", training_fraction_split=0.8, test_fraction_split=0.2, dataset_id=DATASET_ID, prediction_type="classification", ) initial_model_resource_name = PythonVirtualenvOperator( task_id="initial_model_resource_name", python_callable=get_parent_model, requirements=["google-cloud-aiplatform"], op_kwargs={ "project_id": PROJECT_ID, }, ) model_version_2 = CreateAutoMLTextTrainingJobOperator( task_id="create_auto_ml_training_job-2", project_id=PROJECT_ID, region=REGION, display_name="automl-training-job-2", parent_model=initial_model_resource_name.output, training_fraction_split=0.8, test_fraction_split=0.2, dataset_id=DATASET_ID, prediction_type="classification", ) initial_model >> initial_model_resource_name >> model_version_2 ``` ### Anything else This problem only occurs when using the `CreateAutoMLTextTrainingJobOperator`, and not with the Vertex AI SDK for Python. For example, we were able to implement model versioning successfully using something like: `google-cloud-aiplatform==1.41.0` ```python from google.cloud import aiplatform aiplatform.init(project=PROJECT, location=LOCATION) text_dataset = aiplatform.TextDataset(DATASET_ID) job = aiplatform.AutoMLTextTrainingJob( display_name=display_name, prediction_type="classification", multi_label=False, ) model = job.run( dataset=text_dataset, model_display_name=model_display_name, training_fraction_split=0.8, validation_fraction_split=0.1, test_fraction_split=0.1, parent_model=PARENT_MODEL_ID, is_default_version=is_default_version, ) model.wait() ``` ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
