artemonsh commented on PR #22118:
URL: https://github.com/apache/superset/pull/22118#issuecomment-1320268015

   @EugeneTorap @villebro 
   Could you please take a look at this issue: 
https://github.com/apache/superset/issues/21657?
   
   There is a quite annoying problem with cyrillic languages like Russian. In 
short, dashboards, charts and databases containing only cyrillic letters 
**cannot be imported into Superset**! The function Eugene provided, for 
instance, converts chart title "Мой график" ("My chart") into an empty string 
in python, leading to this type of filename: `_123.yaml`. And as the first 
character in the filename is underscore, the `is_valid_config` function returns 
`False` for this filename, which then prohibits to import the file, despite the 
notification in the bottom right corner saying that everything was imported 
successfully :/
   
   Therefore, charts/dashboards/databases with these types of titles **cannot 
not be imported** into Superset. My suggestion is to expand werkzeug's 
`secure_filename` 
[function](https://tedboy.github.io/flask/_modules/werkzeug/utils.html#secure_filename)
 as follows:
   
   ```python
   import unicodedata
   
   
   def secure_filename(filename: str) -> str:
       r"""Pass it a filename and it will return a secure version of it.  This
       filename can then safely be stored on a regular file system and passed
       to :func:`os.path.join`.
   
       On windows systems the function also makes sure that the file is not
       named after one of the special device files.
   
       The function also takes filenames containing cyrillic letters.
   
       >>> secure_filename("My cool movie.mov")
       'My_cool_movie.mov'
       >>> secure_filename("../../../etc/passwd")
       'etc_passwd'
       >>> secure_filename('i contain cool \xfcml\xe4uts.txt')
       'i_contain_cool_umlauts.txt'
       >>> secure_filename('Мой красивый график.yaml')
       'Мой_красивыи_график.yaml'
   
       The function might return an empty filename.  It's your responsibility
       to ensure that the filename is unique and that you abort or
       generate a random filename if the function returned an empty one.
   
       .. versionadded:: 0.5
   
       :param filename: the filename to secure
       """
       # If the text contains cyrillic letters, ASCII encoding should not
       # be used as it does not contain cyrillic letters
       contains_cyrillic_letters = bool(re.search("[\u0400-\u04FF]", filename))
   
       _windows_device_files = (
           "CON",
           "AUX",
           "COM1",
           "COM2",
           "COM3",
           "COM4",
           "LPT1",
           "LPT2",
           "LPT3",
           "PRN",
           "NUL",
       )
   
       _filename_ascii_strip_re = re.compile(r"[^A-Za-z0-9_.-]")
       _filename_strip_re = (
           re.compile(r"[^A-Za-zа-яА-ЯёЁ0-9_.-]")
           if contains_cyrillic_letters
           else _filename_ascii_strip_re
       )
   
       filename = unicodedata.normalize("NFKD", filename)
       if not contains_cyrillic_letters:
           filename = filename.encode("ascii", "ignore").decode("ascii")
   
       for sep in os.path.sep, os.path.altsep:
           if sep:
               filename = filename.replace(sep, " ")
       filename = str(_filename_strip_re.sub("", 
"_".join(filename.split()))).strip("._")
   
       # on nt a couple of special files are present in each folder.  We
       # have to ensure that the target file is not such a filename.  In
       # this case we prepend an underline
       if (
           os.name == "nt"
           and filename
           and filename.split(".")[0].upper() in _windows_device_files
       ):
           filename = f"_{filename}"
   
       return filename
      ```
      
      Before:
      ```python
   >>>print(secure_filename("Мой красивый график"))
   >>>print(secure_filename("My beautiful график"))
   >>>print(secure_filename("My beautiful chart"))
   
   My_beautiful
   My_beautiful_chart
      ```
      After:
      ```python
   >>>print(secure_filename("Мой красивый график"))
   >>>print(secure_filename("My beautiful график"))
   >>>print(secure_filename("My beautiful chart"))
   Мои_красивыи_график
   My_beautiful_график
   My_beautiful_chart
      ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to