[GitHub] [incubator-superset] pphszx commented on a change in pull request #10450: fix: excel sheet upload is not working

GitBox Wed, 29 Jul 2020 02:02:24 -0700


pphszx commented on a change in pull request #10450:
URL: 
https://github.com/apache/incubator-superset/pull/10450#discussion_r461608390




##########
File path: superset/views/database/forms.py
##########
@@ -303,8 +300,11 @@ def at_least_one_schema_is_allowed(database: Database) -> 
bool:
         ],
     )
 
-    sheet_name = StringField(
-        _("Sheet Name"), description="Sheet Name", validators=[Optional()]
+    sheet_name = IntegerField(
+        _("Sheet Name"),
+        description="Integers used in zero-indexed sheet positions (0 is first 
sheet).",
+        validators=[Optional(), NumberRange(min=0)],
+        widget=BS3TextFieldWidget(),
     )

Review comment:
       the default value for sheet_name is 0, see 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html,
 and if it is set to string with default value None, all sheets will be read, 
and the result will be a dict of dataframes, which is not desired and cause 
exception, so I use integer instead.

##########
File path: superset/db_engine_specs/base.py
##########
@@ -438,10 +438,7 @@ def excel_to_df(**kwargs: Any) -> pd.DataFrame:
         """
         kwargs["encoding"] = "utf-8"
         kwargs["iterator"] = True
-        chunks = pd.io.excel.read_excel(
-            io=kwargs["filepath_or_buffer"], sheet_name=kwargs["sheet_name"]
-        )
-        df = pd.concat(chunk for chunk in chunks.values())

Review comment:
       read_excel does not support chunksize arg, so i remove it.

##########
File path: superset/db_engine_specs/base.py
##########
@@ -513,7 +510,7 @@ def create_table_from_excel(  # pylint: 
disable=too-many-arguments
         Create table from contents of a excel. Note: this method does not 
create
         metadata for the table.
         """
-        df = cls.excel_to_df(filepath_or_buffer=filename, 
**excel_to_df_kwargs,)
+        df = cls.excel_to_df(io=filename, **excel_to_df_kwargs,)
         engine = cls.get_engine(database)

Review comment:
       the first arg for read_excel is io, not filepath_or_buffer, which is for 
read_csv

##########
File path: superset/views/database/forms.py
##########
@@ -228,16 +228,16 @@ def at_least_one_schema_is_allowed(database: Database) -> 
bool:
 
 class ExcelToDatabaseForm(DynamicForm):
     # pylint: disable=E0211
-    def excel_allowed_dbs():  # type: ignore
-        excel_allowed_dbs = []
+    def excel_allowed_dbs() -> List[Database]:  # type: ignore
         # TODO: change allow_csv_upload to allow_file_upload
         excel_enabled_dbs = (
             db.session.query(Database).filter_by(allow_csv_upload=True).all()
         )
-        for excel_enabled_db in excel_enabled_dbs:
-            if 
ExcelToDatabaseForm.at_least_one_schema_is_allowed(excel_enabled_db):
-                excel_allowed_dbs.append(excel_enabled_db)
-        return excel_allowed_dbs

Review comment:
       change to stick with class CsvToDatabaseForm(DynamicForm)

##########
File path: superset/views/database/forms.py
##########
@@ -265,10 +265,7 @@ def at_least_one_schema_is_allowed(database: Database) -> 
bool:
                 b) if database supports schema
                     user is able to upload to schema in 
schemas_allowed_for_csv_upload
         """
-        if (
-            security_manager.database_access(database)
-            or security_manager.all_datasource_access()
-        ):
+        if security_manager.can_access_database(database):
             return True

Review comment:
       security_manager.can_access_database has already checked 
all_datasource_access

##########
File path: superset/views/database/forms.py
##########
@@ -356,9 +356,6 @@ def at_least_one_schema_is_allowed(database: Database) -> 
bool:
         _("Mangle Duplicate Columns"),
         description=_('Specify duplicate columns as "X.0, X.1".'),
     )
-    skipinitialspace = BooleanField(
-        _("Skip Initial Space"), description=_("Skip spaces after delimiter.")
-    )
     skiprows = IntegerField(

Review comment:
       skipinitialspace is not supported by read_excel, it's for read_csv

##########
File path: superset/views/database/forms.py
##########
@@ -371,6 +368,13 @@ def at_least_one_schema_is_allowed(database: Database) -> 
bool:
         validators=[Optional(), NumberRange(min=0)],
         widget=BS3TextFieldWidget(),
     )
+    parse_dates = CommaSeparatedListField(
+        _("Parse Dates"),
+        description=_(
+            "A comma separated list of columns that should be " "parsed as 
dates."
+        ),
+        filters=[filter_not_empty_values],
+    )
     decimal = StringField(

Review comment:
       parse_dates is supported by both read_excel and read_csv, so i keep it.

##########
File path: superset/views/database/views.py
##########
@@ -263,10 +263,9 @@ class ExcelToDatabaseView(SimpleFormView):
     def form_get(self, form: ExcelToDatabaseForm) -> None:
         form.header.data = 0
         form.mangle_dupe_cols.data = True
-        form.skipinitialspace.data = False
         form.decimal.data = "."
         form.if_exists.data = "fail"
-        form.sheet_name = None
+        form.sheet_name.data = 0
 

Review comment:
       the default value for sheet_name is 0

##########
File path: superset/views/database/views.py
##########
@@ -336,7 +342,7 @@ def form_post(self, form: ExcelToDatabaseForm) -> Response:
             # E.g. if hive was used to upload a excel, presto will be a better 
option
             # to explore the table.
             expore_database = database
-            explore_database_id = 
database.get_extra().get("explore_database_id", None)
+            explore_database_id = database.explore_database_id
             if explore_database_id:

Review comment:
       stick with codes of class CsvToDatabaseView(SimpleFormView)

##########
File path: superset/views/database/views.py
##########
@@ -307,16 +306,29 @@ def form_post(self, form: ExcelToDatabaseForm) -> 
Response:
             database = (
                 
db.session.query(models.Database).filter_by(id=con.data.get("id")).one()
             )
+
+            # some params are not supported by pandas.read_excel (e.g. 
chunksize).
+            # More can be found here:
+            # 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
             excel_to_df_kwargs = {
                 "header": form.header.data if form.header.data else 0,
                 "index_col": form.index_col.data,
                 "mangle_dupe_cols": form.mangle_dupe_cols.data,
-                "skipinitialspace": form.skipinitialspace.data,
                 "skiprows": form.skiprows.data,
                 "nrows": form.nrows.data,
-                "sheet_name": form.sheet_name.data,
-                "chunksize": 1000,
+                "sheet_name": (
+                    int(form.sheet_name.data)
+                    if form.sheet_name.data.isdigit()
+                    else form.sheet_name.data
+                )
+                if form.sheet_name.data
+                else 0,

Review comment:
       @villebro, I've taken your advice and change the code, thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-superset] pphszx commented on a change in pull request #10450: fix: excel sheet upload is not working

Reply via email to