korbit-ai[bot] commented on code in PR #32670:
URL: https://github.com/apache/superset/pull/32670#discussion_r1996420476
##########
superset/commands/importers/v1/assets.py:
##########
@@ -79,12 +82,29 @@
kwargs.get("ssh_tunnel_priv_key_passwords") or {}
)
self._configs: dict[str, Any] = {}
+ self.sparse = kwargs.get("sparse", False)
# pylint: disable=too-many-locals
@staticmethod
- def _import(configs: dict[str, Any]) -> None: # noqa: C901
+ def _import(configs: dict[str, Any], sparse: bool = False) -> None: #
noqa: C901
# import databases first
database_ids: dict[str, int] = {}
+ dataset_info: dict[str, dict[str, Any]] = {}
+ chart_ids: dict[str, int] = {}
+
+ if sparse:
+ existing_charts = db.session.query(Slice).all()
+ existing_datasets = db.session.query(SqlaTable).all()
+ existing_databases = db.session.query(Database).all()
+ chart_ids = {str(x.uuid): x.id for x in existing_charts}
+ database_ids = {str(x.uuid): x.id for x in existing_databases}
Review Comment:
### Inefficient bulk loading in sparse mode <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The sparse import functionality queries all existing charts, datasets, and
databases, which could lead to performance issues with large databases.
###### Why this matters
Loading all records into memory at once for large installations could cause
excessive memory usage and slow down the import process.
###### Suggested change ∙ *Feature Preview*
Use batched queries to load records in chunks and implement a lookup
mechanism. Example:
```python
def get_existing_items_batched(model_class, batch_size=1000):
offset = 0
mapping = {}
while True:
batch =
db.session.query(model_class).limit(batch_size).offset(offset).all()
if not batch:
break
mapping.update({str(x.uuid): x.id for x in batch})
offset += batch_size
return mapping
if sparse:
chart_ids = get_existing_items_batched(Slice)
database_ids = get_existing_items_batched(Database)
dataset_info = {}
for dataset in get_existing_items_batched(SqlaTable):
dataset_info[str(dataset.uuid)] = {
"datasource_id": dataset.id,
"datasource_type": dataset.datasource_type,
"datasource_name": dataset.datasource_name,
}
```
</details>
<sub>
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7aafef91-04ea-4a94-b999-8dfbc0f336f0?suggestedFixEnabled=true)
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:c791c099-da99-42f5-bc8a-8af1e6298894 -->
##########
superset/importexport/api.py:
##########
@@ -177,6 +177,7 @@ def import_(self) -> Response:
if not contents:
raise NoValidFilesFoundError()
+ sparse = request.form.get("sparse") == "true"
Review Comment:
### Brittle Boolean Parsing <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The code uses string comparison ("true") for determining the sparse
parameter value, which can be error-prone and miss other valid boolean
representations.
###### Why this matters
This strict string comparison could cause the sparse functionality to fail
if clients send variations like 'True', '1', or other valid boolean
representations, leading to unexpected import behavior.
###### Suggested change ∙ *Feature Preview*
Use a more robust boolean parsing method that handles various truthy values:
```python
def parse_boolean(value):
if isinstance(value, bool):
return value
if isinstance(value, str):
value = value.lower()
if value in ('true', 't', '1', 'yes', 'y'):
return True
if value in ('false', 'f', '0', 'no', 'n'):
return False
return False
sparse = parse_boolean(request.form.get("sparse"))
```
</details>
<sub>
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/cdb3f938-6040-476a-8a7f-dc7990c777b5?suggestedFixEnabled=true)
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:bb304b23-8333-4ae1-b969-cabfa760fb1a -->
##########
superset/commands/importers/v1/assets.py:
##########
@@ -79,12 +82,29 @@
kwargs.get("ssh_tunnel_priv_key_passwords") or {}
)
self._configs: dict[str, Any] = {}
+ self.sparse = kwargs.get("sparse", False)
# pylint: disable=too-many-locals
@staticmethod
- def _import(configs: dict[str, Any]) -> None: # noqa: C901
+ def _import(configs: dict[str, Any], sparse: bool = False) -> None: #
noqa: C901
Review Comment:
### Missing parameter documentation in _import method <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The `sparse` parameter is not documented despite being a significant
behavior flag.
###### Why this matters
Without understanding what 'sparse' does, developers may misuse this
parameter, leading to unexpected import behavior.
###### Suggested change ∙ *Feature Preview*
@staticmethod
def _import(configs: dict[str, Any], sparse: bool = False) -> None: #
noqa: C901
"""Import all the assets from the configs.
Args:
configs: The configuration dictionary for assets
sparse: When True, uses existing assets (charts, datasets,
databases) instead of recreating them
"""
</details>
<sub>
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/70f2d925-c73c-4bc2-9760-1f797fd998c0?suggestedFixEnabled=true)
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:bf9da407-6b5e-416f-ad3c-df9f6eddfd12 -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]