eladkal commented on code in PR #45726:
URL: https://github.com/apache/airflow/pull/45726#discussion_r1951557565


##########
providers/amazon/src/airflow/providers/amazon/aws/hooks/sagemaker_unified_studio.py:
##########
@@ -0,0 +1,188 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""This module contains the Amazon SageMaker Unified Studio Notebook hook."""
+
+import time
+
+from sagemaker_studio import ClientConfig
+from sagemaker_studio._openapi.models import GetExecutionRequest, 
StartExecutionRequest

Review Comment:
   Why are we importing from private path? If upstream library decided 
`_openapi` is private we should not import it in Airflow.



##########
providers/amazon/src/airflow/providers/amazon/aws/hooks/sagemaker_unified_studio.py:
##########
@@ -0,0 +1,188 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""This module contains the Amazon SageMaker Unified Studio Notebook hook."""
+
+import time
+
+from sagemaker_studio import ClientConfig
+from sagemaker_studio._openapi.models import GetExecutionRequest, 
StartExecutionRequest
+from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
+
+from airflow import AirflowException
+from airflow.hooks.base import BaseHook
+from airflow.providers.amazon.aws.utils.sagemaker_unified_studio import 
is_local_runner
+
+
+class SageMakerNotebookHook(BaseHook):
+    """
+    Interact with Sagemaker Unified Studio Workflows.
+
+    This hook provides a wrapper around the Sagemaker Workflows Notebook 
Execution API.
+
+    Examples:
+     .. code-block:: python
+
+        from airflow.providers.amazon.aws.hooks.sagemaker_unified_studio 
import SageMakerNotebookHook
+
+        notebook_hook = SageMakerNotebookHook(
+            input_config={'input_path': 'path/to/notebook.ipynb', 
'input_params': {'param1': 'value1'}},
+            output_config={'output_uri': 'folder/output/location/prefix', 
'output_formats': 'NOTEBOOK'},
+            execution_name='notebook_execution',
+            waiter_delay=10,
+            waiter_max_attempts=1440,
+        )
+
+    :param execution_name: The name of the notebook job to be executed, this 
is same as task_id.
+    :param input_config: Configuration for the input file.
+        Example: {'input_path': 'folder/input/notebook.ipynb', 'input_params': 
{'param1': 'value1'}}
+    :param output_config: Configuration for the output format. It should 
include an output_formats parameter to specify the output format.
+        Example: {'output_formats': ['NOTEBOOK']}
+    :param compute: compute configuration to use for the notebook execution. 
This is a required attribute
+        if the execution is on a remote compute.
+        Example: { "InstanceType": "ml.m5.large", "VolumeSizeInGB": 30, 
"VolumeKmsKeyId": "", "ImageUri": "string", "ContainerEntrypoint": [ "string" ]}
+    :param termination_condition: conditions to match to terminate the remote 
execution.
+        Example: { "MaxRuntimeInSeconds": 3600 }
+    :param tags: tags to be associated with the remote execution runs.
+        Example: { "md_analytics": "logs" }
+    :param waiter_delay: Interval in seconds to check the task execution 
status.
+    :param waiter_max_attempts: Number of attempts to wait before returning 
FAILED.
+    """
+
+    def __init__(
+        self,
+        execution_name: str,
+        input_config: dict = {},
+        output_config: dict = {"output_formats": ["NOTEBOOK"]},
+        compute: dict = {},
+        termination_condition: dict = {},
+        tags: dict = {},
+        waiter_delay: int = 10,
+        waiter_max_attempts: int = 1440,
+        *args,
+        **kwargs,
+    ):
+        super().__init__(*args, **kwargs)
+        self._sagemaker_studio = 
SageMakerStudioAPI(self._get_sagemaker_studio_config())

Review Comment:
   Does SageMakerStudioAPI invoke an API call to SageMaker? If so I am not sure 
if it's advisable to have it in the hook init function?



##########
providers/amazon/src/airflow/providers/amazon/aws/hooks/sagemaker_unified_studio.py:
##########
@@ -0,0 +1,188 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""This module contains the Amazon SageMaker Unified Studio Notebook hook."""
+
+import time
+
+from sagemaker_studio import ClientConfig
+from sagemaker_studio._openapi.models import GetExecutionRequest, 
StartExecutionRequest
+from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
+
+from airflow import AirflowException
+from airflow.hooks.base import BaseHook
+from airflow.providers.amazon.aws.utils.sagemaker_unified_studio import 
is_local_runner
+
+
+class SageMakerNotebookHook(BaseHook):
+    """
+    Interact with Sagemaker Unified Studio Workflows.
+
+    This hook provides a wrapper around the Sagemaker Workflows Notebook 
Execution API.
+
+    Examples:
+     .. code-block:: python
+
+        from airflow.providers.amazon.aws.hooks.sagemaker_unified_studio 
import SageMakerNotebookHook
+
+        notebook_hook = SageMakerNotebookHook(
+            input_config={'input_path': 'path/to/notebook.ipynb', 
'input_params': {'param1': 'value1'}},
+            output_config={'output_uri': 'folder/output/location/prefix', 
'output_formats': 'NOTEBOOK'},
+            execution_name='notebook_execution',
+            waiter_delay=10,
+            waiter_max_attempts=1440,
+        )
+
+    :param execution_name: The name of the notebook job to be executed, this 
is same as task_id.
+    :param input_config: Configuration for the input file.
+        Example: {'input_path': 'folder/input/notebook.ipynb', 'input_params': 
{'param1': 'value1'}}
+    :param output_config: Configuration for the output format. It should 
include an output_formats parameter to specify the output format.
+        Example: {'output_formats': ['NOTEBOOK']}
+    :param compute: compute configuration to use for the notebook execution. 
This is a required attribute
+        if the execution is on a remote compute.
+        Example: { "InstanceType": "ml.m5.large", "VolumeSizeInGB": 30, 
"VolumeKmsKeyId": "", "ImageUri": "string", "ContainerEntrypoint": [ "string" ]}
+    :param termination_condition: conditions to match to terminate the remote 
execution.
+        Example: { "MaxRuntimeInSeconds": 3600 }
+    :param tags: tags to be associated with the remote execution runs.
+        Example: { "md_analytics": "logs" }
+    :param waiter_delay: Interval in seconds to check the task execution 
status.
+    :param waiter_max_attempts: Number of attempts to wait before returning 
FAILED.
+    """

Review Comment:
   The doc string contains many examples but its not clear to me what is the 
source? Wouldn't it be simpler just link to the source of upsrream library 
about all possible values rather than having so many examples?



##########
providers/amazon/docs/operators/sagemakerunifiedstudio.rst:
##########
@@ -0,0 +1,61 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+===============================
+Amazon SageMaker Unified Studio
+===============================
+
+`Amazon SageMaker Unified Studio 
<https://docs.aws.amazon.com/sagemaker-unified-studio>`__ is a unified 
development experience that
+brings together AWS data, analytics, artificial intelligence (AI), and machine 
learning (ML) services.
+It provides a place to build, deploy, execute, and monitor end-to-end 
workflows from a single interface.
+This helps drive collaboration across teams and facilitate agile development.
+
+Airflow provides operators to orchestrate Notebooks, Querybooks, and Visual 
ETL jobs within SageMaker Unified Studio Workflows.
+
+Prerequisite Tasks
+------------------
+
+To use these operators, you must do a few things:
+
+  * Create a SageMaker Unified Studio domain.
+  * Within your domain, create a project with the "Data analytics and AI-ML 
model development" project profile.

Review Comment:
   Lets please avoid explaining about AWS services.
   We should list the prerequisite and link to aws doc that explain how to set 
it up. I don't want to handle issues/review PR of fixing these instructions 
everytime aws change something in thier product 



##########
providers/amazon/src/airflow/providers/amazon/aws/operators/sagemaker_unified_studio.py:
##########
@@ -0,0 +1,148 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""This module contains the Amazon SageMaker Unified Studio Notebook 
operator."""
+
+from functools import cached_property
+
+from airflow import AirflowException
+from airflow.configuration import conf
+from airflow.models import BaseOperator
+from airflow.providers.amazon.aws.hooks.sagemaker_unified_studio import (
+    SageMakerNotebookHook,
+)
+from airflow.providers.amazon.aws.triggers.sagemaker_unified_studio import (
+    SageMakerNotebookJobTrigger,
+)
+from airflow.utils.context import Context
+
+
+class SageMakerNotebookOperator(BaseOperator):

Review Comment:
   I'm missing handling of operator extra link here.



##########
providers/amazon/src/airflow/providers/amazon/aws/hooks/sagemaker_unified_studio.py:
##########
@@ -0,0 +1,188 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""This module contains the Amazon SageMaker Unified Studio Notebook hook."""
+
+import time
+
+from sagemaker_studio import ClientConfig
+from sagemaker_studio._openapi.models import GetExecutionRequest, 
StartExecutionRequest
+from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
+
+from airflow import AirflowException
+from airflow.hooks.base import BaseHook
+from airflow.providers.amazon.aws.utils.sagemaker_unified_studio import 
is_local_runner
+
+
+class SageMakerNotebookHook(BaseHook):
+    """
+    Interact with Sagemaker Unified Studio Workflows.
+
+    This hook provides a wrapper around the Sagemaker Workflows Notebook 
Execution API.
+
+    Examples:
+     .. code-block:: python
+
+        from airflow.providers.amazon.aws.hooks.sagemaker_unified_studio 
import SageMakerNotebookHook
+
+        notebook_hook = SageMakerNotebookHook(
+            input_config={'input_path': 'path/to/notebook.ipynb', 
'input_params': {'param1': 'value1'}},
+            output_config={'output_uri': 'folder/output/location/prefix', 
'output_formats': 'NOTEBOOK'},
+            execution_name='notebook_execution',
+            waiter_delay=10,
+            waiter_max_attempts=1440,
+        )
+
+    :param execution_name: The name of the notebook job to be executed, this 
is same as task_id.
+    :param input_config: Configuration for the input file.
+        Example: {'input_path': 'folder/input/notebook.ipynb', 'input_params': 
{'param1': 'value1'}}
+    :param output_config: Configuration for the output format. It should 
include an output_formats parameter to specify the output format.
+        Example: {'output_formats': ['NOTEBOOK']}
+    :param compute: compute configuration to use for the notebook execution. 
This is a required attribute
+        if the execution is on a remote compute.
+        Example: { "InstanceType": "ml.m5.large", "VolumeSizeInGB": 30, 
"VolumeKmsKeyId": "", "ImageUri": "string", "ContainerEntrypoint": [ "string" ]}
+    :param termination_condition: conditions to match to terminate the remote 
execution.
+        Example: { "MaxRuntimeInSeconds": 3600 }
+    :param tags: tags to be associated with the remote execution runs.
+        Example: { "md_analytics": "logs" }
+    :param waiter_delay: Interval in seconds to check the task execution 
status.
+    :param waiter_max_attempts: Number of attempts to wait before returning 
FAILED.
+    """
+
+    def __init__(
+        self,
+        execution_name: str,
+        input_config: dict = {},
+        output_config: dict = {"output_formats": ["NOTEBOOK"]},

Review Comment:
   Why this was chosen as default?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to