Taragolis commented on code in PR #38040:
URL: https://github.com/apache/airflow/pull/38040#discussion_r1520172677


##########
airflow/providers/google/cloud/transfers/postgres_to_gcs.py:
##########
@@ -112,7 +112,13 @@ def __init__(
         self.cursor_itersize = cursor_itersize
 
     def _unique_name(self):
-        return f"{self.dag_id}__{self.task_id}__{uuid.uuid4()}" if 
self.use_server_side_cursor else None
+        """
+        Generates a deterministic UUID for the cursor name,
+        using the combination of DAG ID and task ID.
+        """
+        if self.use_server_side_cursor:
+            return str(uuid5(uuid5(NAMESPACE_OID, self.dag_id), self.task_id))

Review Comment:
   Maybe deterministic here is not a better choose, e.g. it will create the 
same cursor name for the same task.
   with different dagrun/map_id/try_number
   
   So maybe random part here it required, and better sanitise non-random part 
to the specific length
   
   Some simple snippet
   
   ```python
   from slugify import slugify
   import uuid
   
   
   def sample(dag_id, task_id):
       separator = "__"
       random_suffix = str(uuid.uuid4())
       available_length = 63 - len(random_suffix) - len(separator) * 2
       return separator.join(
           (
               slugify(dag_id, allow_unicode=False, max_length=available_length 
// 2),
               slugify(task_id, allow_unicode=False, 
max_length=available_length // 2),
               random_suffix
           )
       )
   
   samples = [
       ("a" * 100, "b" * 100),
       ("a" * 10, "b" * 10),
       ("a" * 10, "b" * 10),
       ("dag", "task1"),
       ("Lorem ipsum dolor sit amet.", "Lorem ipsum dolor sit amet."),
   ]
   for task_id, dag_id in samples:
       result = sample(task_id, dag_id)
       print(f"result={result}, length={len(result)}")
   ```
   
   It is also require to add 
[`slugify`](https://pypi.org/project/python-slugify/) as dependency [for Google 
Provider](https://github.com/apache/airflow/blob/1e6140ba250a9a99b5f1b7d73aa689e89d4152ec/airflow/providers/google/provider.yaml#L89).
 This is a core airflow dependency, but better to avoid the situation if it 
removed from core dependency in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to