Brunda10 commented on code in PR #54657:
URL: https://github.com/apache/airflow/pull/54657#discussion_r2344299108


##########
providers/common/io/src/airflow/providers/common/io/xcom/backend.py:
##########
@@ -49,15 +49,24 @@
 
 def _get_compression_suffix(compression: str) -> str:
     """
-    Return the compression suffix for the given compression.
+    Return the compression suffix (e.g., 'gz' for gzip) for the given 
compression.
 
-    :raises ValueError: if the compression is not supported
+    :raises ValueError: if the compression is not supported.
     """
-    for suffix, c in fsspec.utils.compressions.items():
-        if c == compression:
-            return suffix
+    if compression is None:
+        return ""
 
-    raise ValueError(f"Compression {compression} is not supported. Make sure 
it is installed.")
+    # fsspec >=2023: list of available compression algorithms
+    available = {c.lower(): c for c in fsspec.available_compressions() if c is 
not None}
+
+    # fsspec <2023: dict mapping suffix -> codec (e.g., {'gz': 'gzip'})
+    legacy_compressions = {k.lower(): v for k, v in 
fsspec.utils.compressions.items() if v}

Review Comment:
    Thanks for pointing this out @uranusjr ,
    fsspec.available_compressions() and fsspec.utils.compressions involve a 
dictionary/list construction, and calling them repeatedly could add overhead.
   
   To address this, I’ve:
   
   Wrapped the logic in a helper (_get_available_compressions)
   Added `@lru_cache(maxsize=1) `so the values are only computed once per 
process
   
   This makes the calls  lazy + cached, while still ensuring we don’t recompute 
unless explicitly invalidated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to