alexott commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r840396480



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,50 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, dbfs_path: str, raw_code: str, language: str, 
overwrite: bool = True):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks 
saved to /Shared/airflow dbfs

Review comment:
       it's incorrect sentence - notebooks aren't stored on DBFS, they are 
stored in the workspace. To work with DBFS there is an another API

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,50 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, dbfs_path: str, raw_code: str, language: str, 
overwrite: bool = True):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks 
saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param dbfs_path: String path on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 
'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite 
existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        # encode notebook
+        encoded_bytes = base64.b64encode(raw_code.encode("utf-8"))
+        encoded_str = str(encoded_bytes, "utf-8")
+
+        # create parent directory if not exists
+        path_parts = dbfs_path.split('/')
+        path_parts.pop(0)
+        path_parts = path_parts[:-1]
+
+        path = ''
+        for part in path_parts:
+            path += f'/{part}'
+        #TODO: Add warning if already exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': path})

Review comment:
       `MKDIRS` can also return `RESOURCE_ALREADY_EXISTS` - see 
[docs](https://docs.databricks.com/dev-tools/api/latest/workspace.html#mkdirs)

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,50 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, dbfs_path: str, raw_code: str, language: str, 
overwrite: bool = True):

Review comment:
       Add a format option - besides `SOURCE`, notebooks could be imported as 
`HTML`, `DBC`, etc. in this case `language` should be optional.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,50 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, dbfs_path: str, raw_code: str, language: str, 
overwrite: bool = True):

Review comment:
       Rename `dbfs_path` to `workspace_path`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to