Re: [PR] fix(ci): loading examples in CI returns http error 429 [superset]

via GitHub Mon, 12 May 2025 16:47:33 -0700


korbit-ai[bot] commented on code in PR #33412:
URL: https://github.com/apache/superset/pull/33412#discussion_r2085694848



##########
superset/examples/flights.py:
##########
@@ -37,12 +37,14 @@ def load_flights(only_metadata: bool = False, force: bool = 
False) -> None:
         table_exists = database.has_table(Table(tbl_name, schema))
 
         if not only_metadata and (not table_exists or force):
-            flight_data_url = get_example_url("flight_data.csv.gz")
-            pdf = pd.read_csv(flight_data_url, encoding="latin-1", 
compression="gzip")
+            pdf = read_example_csv(
+                "flight_data.csv.gz", encoding="latin-1", compression="gzip"
+            )
 
             # Loading airports info to join and get lat/long
-            airports_url = get_example_url("airports.csv.gz")
-            airports = pd.read_csv(airports_url, encoding="latin-1", 
compression="gzip")
+            airports = read_example_csv(
+                "flight_data.csv.gz", encoding="latin-1", compression="gzip"
+            )

Review Comment:
   ### Incorrect Data Source for Airports <sub>![category 
Functionality](https://img.shields.io/badge/Functionality-0284c7)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   The airports data is being loaded from the wrong file. It's using 
'flight_data.csv.gz' instead of 'airports.csv.gz'.
   
   ###### Why this matters
   This will cause incorrect data to be loaded for airports, leading to wrong 
lat/long information when joining with the flights data. The subsequent join 
operations will produce invalid results.
   
   ###### Suggested change ∙ *Feature Preview*
   Use the correct file name for loading airports data:
   ```python
   airports = read_example_csv(
       "airports.csv.gz", encoding="latin-1", compression="gzip"
   )
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:c4aa6c08-587c-4b87-8186-49aa8f45644e -->
   
   
   [](c4aa6c08-587c-4b87-8186-49aa8f45644e)



##########
superset/examples/helpers.py:
##########
@@ -119,3 +123,41 @@ def get_example_url(filepath: str) -> str:
     paths like ``datasets/examples/slack/messages.csv``.
     """
     return f"{BASE_URL}{filepath}"
+
+
+def read_example_csv(
+    filepath: str,
+    max_attempts: int = 5,
+    wait_seconds: float = 60,

Review Comment:
   ### Excessive Base Wait Time in Backoff Strategy <sub>![category 
Performance](https://img.shields.io/badge/Performance-4f46e5)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   The base wait time of 60 seconds in the exponential backoff is unnecessarily 
long for HTTP 429 rate limit retries.
   
   ###### Why this matters
   With a 60-second base wait time and exponential backoff (2^n), subsequent 
retries will wait for 60s, 120s, 240s, 480s - leading to extremely long total 
wait times that could significantly impact performance.
   
   ###### Suggested change ∙ *Feature Preview*
   Reduce the base wait time to a more reasonable value like 1-5 seconds. This 
still provides adequate spacing between retries while maintaining 
responsiveness:
   ```python
   wait_seconds: float = 2.0,
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:c8467b9e-04d5-40da-9e56-2a5dcc9530f8 -->
   
   
   [](c8467b9e-04d5-40da-9e56-2a5dcc9530f8)



##########
superset/examples/helpers.py:
##########
@@ -119,3 +123,41 @@
     paths like ``datasets/examples/slack/messages.csv``.
     """
     return f"{BASE_URL}{filepath}"
+
+
+def read_example_csv(
+    filepath: str,
+    max_attempts: int = 5,
+    wait_seconds: float = 60,
+    **kwargs: Any,
+) -> pd.DataFrame:
+    """
+    Load a CSV from the Superset example data mirror with retry/backoff on 
HTTP 429.
+
+    Args:
+        filepath: Relative path within the examples repo (e.g. 
"datasets/foo.csv").
+        max_attempts: Max number of retry attempts.
+        wait_seconds: Base wait time in seconds, will be multiplied
+          exponentially per attempt.
+        **kwargs: Passed directly to pandas.read_csv().
+
+    Returns:
+        pd.DataFrame
+    """
+
+    url = get_example_url(filepath)
+
+    for attempt in range(1, max_attempts + 1):
+        try:
+            return pd.read_csv(url, **kwargs)
+        except HTTPError as e:
+            if e.code == 429 and attempt < max_attempts:
+                sleep_time = wait_seconds * (2 ** (attempt - 1))
+                print(
+                    f"HTTP 429 received from {url}. ",
+                    f"Retrying in {sleep_time:.1f}s ",
+                    "(attempt {attempt}/{max_attempts})...",
+                )

Review Comment:
   ### Fragmented string formatting in retry message <sub>![category 
Readability](https://img.shields.io/badge/Readability-0284c7)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   Using concatenated print statements makes the retry message harder to read 
and maintain. The f-strings are split unnecessarily across multiple lines.
   
   ###### Why this matters
   Split string formatting makes the message structure harder to follow and 
introduces potential formatting inconsistencies. Single f-string would be 
clearer.
   
   ###### Suggested change ∙ *Feature Preview*
   ```python
   print(f"HTTP 429 received from {url}. Retrying in {sleep_time:.1f}s (attempt 
{attempt}/{max_attempts})...")
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:aa4ab7b4-1121-408b-802c-bca06e289402 -->
   
   
   [](aa4ab7b4-1121-408b-802c-bca06e289402)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(ci): loading examples in CI returns http error 429 [superset]

Reply via email to