korbit-ai[bot] commented on code in PR #33412:
URL: https://github.com/apache/superset/pull/33412#discussion_r2085694848
##########
superset/examples/flights.py:
##########
@@ -37,12 +37,14 @@ def load_flights(only_metadata: bool = False, force: bool =
False) -> None:
table_exists = database.has_table(Table(tbl_name, schema))
if not only_metadata and (not table_exists or force):
- flight_data_url = get_example_url("flight_data.csv.gz")
- pdf = pd.read_csv(flight_data_url, encoding="latin-1",
compression="gzip")
+ pdf = read_example_csv(
+ "flight_data.csv.gz", encoding="latin-1", compression="gzip"
+ )
# Loading airports info to join and get lat/long
- airports_url = get_example_url("airports.csv.gz")
- airports = pd.read_csv(airports_url, encoding="latin-1",
compression="gzip")
+ airports = read_example_csv(
+ "flight_data.csv.gz", encoding="latin-1", compression="gzip"
+ )
Review Comment:
### Incorrect Data Source for Airports <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The airports data is being loaded from the wrong file. It's using
'flight_data.csv.gz' instead of 'airports.csv.gz'.
###### Why this matters
This will cause incorrect data to be loaded for airports, leading to wrong
lat/long information when joining with the flights data. The subsequent join
operations will produce invalid results.
###### Suggested change ∙ *Feature Preview*
Use the correct file name for loading airports data:
```python
airports = read_example_csv(
"airports.csv.gz", encoding="latin-1", compression="gzip"
)
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ab7f23f5-1c2d-4d29-beed-78df10891c0b)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:c4aa6c08-587c-4b87-8186-49aa8f45644e -->
[](c4aa6c08-587c-4b87-8186-49aa8f45644e)
##########
superset/examples/helpers.py:
##########
@@ -119,3 +123,41 @@ def get_example_url(filepath: str) -> str:
paths like ``datasets/examples/slack/messages.csv``.
"""
return f"{BASE_URL}{filepath}"
+
+
+def read_example_csv(
+ filepath: str,
+ max_attempts: int = 5,
+ wait_seconds: float = 60,
Review Comment:
### Excessive Base Wait Time in Backoff Strategy <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The base wait time of 60 seconds in the exponential backoff is unnecessarily
long for HTTP 429 rate limit retries.
###### Why this matters
With a 60-second base wait time and exponential backoff (2^n), subsequent
retries will wait for 60s, 120s, 240s, 480s - leading to extremely long total
wait times that could significantly impact performance.
###### Suggested change ∙ *Feature Preview*
Reduce the base wait time to a more reasonable value like 1-5 seconds. This
still provides adequate spacing between retries while maintaining
responsiveness:
```python
wait_seconds: float = 2.0,
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/500ea470-02b5-4a35-bf32-1c9520c44f05)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:c8467b9e-04d5-40da-9e56-2a5dcc9530f8 -->
[](c8467b9e-04d5-40da-9e56-2a5dcc9530f8)
##########
superset/examples/helpers.py:
##########
@@ -119,3 +123,41 @@
paths like ``datasets/examples/slack/messages.csv``.
"""
return f"{BASE_URL}{filepath}"
+
+
+def read_example_csv(
+ filepath: str,
+ max_attempts: int = 5,
+ wait_seconds: float = 60,
+ **kwargs: Any,
+) -> pd.DataFrame:
+ """
+ Load a CSV from the Superset example data mirror with retry/backoff on
HTTP 429.
+
+ Args:
+ filepath: Relative path within the examples repo (e.g.
"datasets/foo.csv").
+ max_attempts: Max number of retry attempts.
+ wait_seconds: Base wait time in seconds, will be multiplied
+ exponentially per attempt.
+ **kwargs: Passed directly to pandas.read_csv().
+
+ Returns:
+ pd.DataFrame
+ """
+
+ url = get_example_url(filepath)
+
+ for attempt in range(1, max_attempts + 1):
+ try:
+ return pd.read_csv(url, **kwargs)
+ except HTTPError as e:
+ if e.code == 429 and attempt < max_attempts:
+ sleep_time = wait_seconds * (2 ** (attempt - 1))
+ print(
+ f"HTTP 429 received from {url}. ",
+ f"Retrying in {sleep_time:.1f}s ",
+ "(attempt {attempt}/{max_attempts})...",
+ )
Review Comment:
### Fragmented string formatting in retry message <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
Using concatenated print statements makes the retry message harder to read
and maintain. The f-strings are split unnecessarily across multiple lines.
###### Why this matters
Split string formatting makes the message structure harder to follow and
introduces potential formatting inconsistencies. Single f-string would be
clearer.
###### Suggested change ∙ *Feature Preview*
```python
print(f"HTTP 429 received from {url}. Retrying in {sleep_time:.1f}s (attempt
{attempt}/{max_attempts})...")
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c206eb8-4cae-4772-893c-0bcbf25d43b4)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:aa4ab7b4-1121-408b-802c-bca06e289402 -->
[](aa4ab7b4-1121-408b-802c-bca06e289402)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]