artemonsh commented on PR #22118: URL: https://github.com/apache/superset/pull/22118#issuecomment-1320268015
@EugeneTorap @villebro Could you please take a look at this issue: https://github.com/apache/superset/issues/21657? There is a quite annoying problem with cyrillic languages like Russian. In short, dashboards, charts and databases containing only cyrillic letters **cannot be imported into Superset**! The function Eugene provided, for instance, converts chart title "Мой график" ("My chart") into an empty string in python, leading to this type of filename: `_123.yaml`. And as the first character in the filename is underscore, the `is_valid_config` function returns `False` for this filename, which then prohibits to import the file, despite the notification in the bottom right corner saying that everything was imported successfully :/ Therefore, charts/dashboards/databases with these types of titles **cannot not be imported** into Superset. My suggestion is to expand werkzeug's `secure_filename` [function](https://tedboy.github.io/flask/_modules/werkzeug/utils.html#secure_filename) as follows: ```python import unicodedata def secure_filename(filename: str) -> str: r"""Pass it a filename and it will return a secure version of it. This filename can then safely be stored on a regular file system and passed to :func:`os.path.join`. On windows systems the function also makes sure that the file is not named after one of the special device files. The function also takes filenames containing cyrillic letters. >>> secure_filename("My cool movie.mov") 'My_cool_movie.mov' >>> secure_filename("../../../etc/passwd") 'etc_passwd' >>> secure_filename('i contain cool \xfcml\xe4uts.txt') 'i_contain_cool_umlauts.txt' >>> secure_filename('Мой красивый график.yaml') 'Мой_красивыи_график.yaml' The function might return an empty filename. It's your responsibility to ensure that the filename is unique and that you abort or generate a random filename if the function returned an empty one. .. versionadded:: 0.5 :param filename: the filename to secure """ # If the text contains cyrillic letters, ASCII encoding should not # be used as it does not contain cyrillic letters contains_cyrillic_letters = bool(re.search("[\u0400-\u04FF]", filename)) _windows_device_files = ( "CON", "AUX", "COM1", "COM2", "COM3", "COM4", "LPT1", "LPT2", "LPT3", "PRN", "NUL", ) _filename_ascii_strip_re = re.compile(r"[^A-Za-z0-9_.-]") _filename_strip_re = ( re.compile(r"[^A-Za-zа-яА-ЯёЁ0-9_.-]") if contains_cyrillic_letters else _filename_ascii_strip_re ) filename = unicodedata.normalize("NFKD", filename) if not contains_cyrillic_letters: filename = filename.encode("ascii", "ignore").decode("ascii") for sep in os.path.sep, os.path.altsep: if sep: filename = filename.replace(sep, " ") filename = str(_filename_strip_re.sub("", "_".join(filename.split()))).strip("._") # on nt a couple of special files are present in each folder. We # have to ensure that the target file is not such a filename. In # this case we prepend an underline if ( os.name == "nt" and filename and filename.split(".")[0].upper() in _windows_device_files ): filename = f"_{filename}" return filename ``` Before: ```python >>>print(secure_filename("Мой красивый график")) >>>print(secure_filename("My beautiful график")) >>>print(secure_filename("My beautiful chart")) My_beautiful My_beautiful_chart ``` After: ```python >>>print(secure_filename("Мой красивый график")) >>>print(secure_filename("My beautiful график")) >>>print(secure_filename("My beautiful chart")) Мои_красивыи_график My_beautiful_график My_beautiful_chart ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
