betodealmeida commented on a change in pull request #15719:
URL: https://github.com/apache/superset/pull/15719#discussion_r670910268



##########
File path: requirements/development.in
##########
@@ -25,3 +25,4 @@ psycopg2-binary==2.8.5
 tableschema
 thrift>=0.11.0,<1.0.0
 pygithub>=1.54.1,<2.0.0
+progress>=1.5

Review comment:
       Let's keep this in the 1.x version, in case 2.0 breaks backwards 
compatibility (expected if they're using semver):
   
   ```suggestion
   progress>=1.5,<2
   ```

##########
File path: scripts/benchmark_migration.py
##########
@@ -171,25 +172,32 @@ def main(
 
     min_entities = 10
     new_models: Dict[Type[Model], List[Model]] = defaultdict(list)
+
     while min_entities <= limit:
         downgrade(revision=down_revision)
         print(f"Running with at least {min_entities} entities of each model")
         for model in models:
             missing = min_entities - model_rows[model]
+            bar = ChargingBar("Processing", max=missing)
+            entities: List[Model] = []
             if missing > 0:
                 print(f"- Adding {missing} entities to the {model.__name__} 
model")
                 try:
-                    added_models = add_sample_rows(session, model, missing)
+                    for entity in add_sample_rows(session, model, missing):
+                        entities.append(entity)
+                        bar.next()
                 except Exception:
                     session.rollback()
                     raise
                 model_rows[model] = min_entities
                 session.commit()
 
                 if auto_cleanup:
-                    new_models[model].extend(added_models)
+                    new_models[model].extend(entities)
 
+        session.add_all(entities)
         start = time.time()
+        bar.finish()

Review comment:
       So, there's a few subtle bugs in your implementation. The critical one 
is that for every model you define `entities` as an empty list. But you only 
add the entities to the session after you've process all models (line 198). 
This means you'll only add the entities of the last model in the `for model in 
models` loop.
   
   I'd also only show the progress bar if you have something to add.
   
   Something like this:
   
   ```python
   while min_entities <= limit:
       downgrade(revision=down_revision)
       print(f"Running with at least {min_entities} entities of each model")
       for model in models:
           missing = min_entities - model_rows[model]
           if missing > 0:
               entities: List[Model] = []
               bar = ChargingBar(f"Adding {missing} entities to the 
{model.__name__} model", max=missing)
               try:
                   for entity in add_sample_rows(session, model, missing)
                       entities.append(entity)
                       bar.next()
               except Exception:
                   session.rollback()
                   raise
               bar.finish()
               model_rows[model] = min_entities
   
               # add all created entities
               session.add_all(entities)
               session.commit()
   
               if auto_cleanup:
                   new_models[model].extend(entities)
   
   ```

##########
File path: superset/utils/mock_data.py
##########
@@ -229,10 +232,9 @@ def generate_column_data(column: ColumnInfo, num_rows: 
int) -> List[Any]:
     return [gen() for _ in range(num_rows)]
 
 
-def add_sample_rows(session: Session, model: Type[Model], count: int) -> 
List[Model]:
+def add_sample_rows(session: Session, model: Type[Model], count: int) -> Model:

Review comment:
       ```suggestion
   def add_sample_rows(session: Session, model: Type[Model], count: int) -> 
Iterator[Model]:
   ```
   
   You probably need to add `from typing import Iterator` as well.

##########
File path: superset/utils/mock_data.py
##########
@@ -229,10 +232,9 @@ def generate_column_data(column: ColumnInfo, num_rows: 
int) -> List[Any]:
     return [gen() for _ in range(num_rows)]
 
 
-def add_sample_rows(session: Session, model: Type[Model], count: int) -> 
List[Model]:
+def add_sample_rows(session: Session, model: Type[Model], count: int) -> Model:
     """
     Add entities of a given model.
-

Review comment:
       Keep this empty line.

##########
File path: superset/utils/mock_data.py
##########
@@ -69,6 +69,9 @@
 
 # pylint: disable=too-many-return-statements, too-many-branches
 def get_type_generator(sqltype: sqlalchemy.sql.sqltypes) -> Callable[[], Any]:
+    if isinstance(sqltype, sqlalchemy.dialects.mysql.types.TINYINT):

Review comment:
       These changes were merged already, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to