pomegranited opened a new issue, #32139:
URL: https://github.com/apache/superset/issues/32139

   ## [SIP] Proposal for translating Superset asset data using custom 
flask-babel extraction methods
   
   ### Motivation
   
   Superset provides translation support for built-in components in the UI. 
However, the Open edX project also needs the user-provided terms used in the 
assets themselves to be translatable, e.g. dashboard and chart title, axes 
labels, and metric labels.
   
   We also need these asset translations to be easily maintained between 
upgrades of Superset, and to be re-deployable when translations are updated.
   
   ### Proposed Change
   
   Superset uses [flask-babel](https://python-babel.github.io/flask-babel/) to 
extract and compile translations marked by the backend and frontend files. We 
propose adding new [custom extraction 
methods](https://babel.pocoo.org/en/latest/messages.html#referencing-extraction-methods)
 to pull out asset field values. These methods would be disabled by default, 
and enabled using a new feature flag, `SUPERSET_TRANSLATE_ASSETS`.
   
   Superset provides the backend translations to the frontend via the [template 
bootstrap data's language_pack 
field](https://github.com/apache/superset/blob/5fe6ef268ec5b136f911ccfe05fb8bdd7fc7a79f/superset/views/base.py#L336).
 Once user-provided values are translated, they are available to any frontend 
component strings wrapped in `@superset-ui.core.t()`.
   
   To make version control and conflict management easier, we propose splitting 
the asset messages into separate files from the upstream-maintained application 
translations used by Superset UI. The asset translations will be concatenated 
with the application message files before being compiled. Thus, these asset 
message files can be maintained per instance on a Superset fork. (Note: 
compiled message `.mo/.json`  files are generated, so are [not 
version-controlled](https://github.com/apache/superset/blob/master/.gitignore#L111-L115).)
   
   #### Process
   
   1. Configure the `SUPERSET_TRANSLATE_ASSETS` feature setting.
   1. Run 
[babel_update.sh](https://github.com/apache/superset/blob/master/scripts/translations/babel_update.sh)
 to extract application and asset translations to (versioned) `.po` files.
   1. Translators will manually update the `.po` files, or use a tool like 
[Transifex](https://www.transifex.com/open-source/) to provide translations in 
the desired languages.
   1. Run `babel_update.sh –compile` to concatenate and compile the 
([unversioned)](https://github.com/apache/superset/blob/master/.gitignore#L111-L115)
 message files consumed by the app.
   
   This process could be optionally run during the Docker image build (when 
[BUILD_TRANSLATIONS=true](https://github.com/apache/superset/blob/c7c3b1b0e99228f261415742469cd4a7f929da7b/Dockerfile#L131)),
 or via the command line on a deployed instance. Superset may need to be 
restarted to apply changes to translations on a running instance.
   
   #### Configuration: SUPERSET_TRANSLATE_ASSETS
   
   If the `SUPERSET_TRANSLATE_ASSETS` feature flag is not found in settings, or 
is Falsey, the custom extraction methods will exit immediately, so that users 
who do not need this feature are not burdened with the overhead of extracting 
asset translations.
   
   This feature flag could be a simple on/off boolean, or a more complex 
structure to include/exclude specific assets, depending on community feedback.
   
   #### Backend: extract asset field values
   Create custom asset extraction methods under 
[superset.translations.utils](https://github.com/apache/superset/blob/master/superset/translations/utils.py)
 which:
   * iterate over the assets using the Superset data APIs
   * pull out each [translatable asset 
field](#appendix-translatable-asset-fields) value as the "message"
   * yield a tuple for each translatable field in the asset, containing:
      * *lineno*: (generate something reasonable here)
      * *funcname*: e.g `asset_<type>_<field_name>_<uuid>`
      * *message*: value of asset field
      * *comments*: generate a comment for translators to describe which asset 
this field is from, and where this field is used
   
   #### Frontend: use translations
   
   Superset provides translations to the frontend via the [template bootstrap 
data's `language_pack` 
field](https://github.com/apache/superset/blob/5fe6ef268ec5b136f911ccfe05fb8bdd7fc7a79f/superset/views/base.py#L336).
 Once user-provided values are translated, they are available to the frontend 
components via `@superset-ui.core.t()`, e.g
   
   ```typescript
   import { t } from '@superset-ui/core';
   …
       // Where the dashboard_title variable is shown to user:
       {t(dashboard_title)}
   ```
   
   #### Appendix: Translatable asset fields
   
   We've identified the following asset fields as needing translations.
   
   ##### Dashboard fields
   * `dashboard_title`
   * `description`
   * `metadata.native_filter_configuration.name`
   * `metadata.native_filter_configuration.description`
   * `position.*.meta.text`
   * `position.*.meta.code`
   * `position.*.meta.sliceNameOverride`
   
   Notes: 
   * `position.*.meta` fields denote positional elements in the Dashboard, e.g 
charts, headings, and markdown text.
   
   ##### Chart fields
   * `slice_name`
   * `description`
   * `params.x_axis_label`
   * `params.y_axis_label`
   * `params.groupby.label`
   
   ##### Dataset fields
   * `metrics.verbose_name`
   * `columns.verbose_name`
   
   ### New or Changed Public Interfaces
   
   We propose updating the command-line script 
[babel_update.sh](https://github.com/apache/superset/blob/master/scripts/translations/babel_update.sh)
 to:
   * Preserve 
[messages.pot](https://github.com/apache/superset/blob/master/superset/translations/messages.pot)
 / 
[messages.po](https://github.com/apache/superset/blob/master/superset/translations/ar/LC_MESSAGES/messages.po)
 files for application messages.
   * Generate `assets.po` files for asset messages.
     Upstream versions will be empty; forks can maintain their own versions.
   * Add a `compile` argument which:
      * Concatenates `messages.po` + `assets.po` to an git-ignored file, 
`superset.po`
      * Compiles each language's `.mo` using [pybabel 
compile](https://github.com/apache/superset/blob/c7c3b1b0e99228f261415742469cd4a7f929da7b/Dockerfile#L132)
      * Rebuilds the frontend `.json` files using [npm run 
build-translations](https://github.com/apache/superset/blob/5fe6ef268ec5b136f911ccfe05fb8bdd7fc7a79f/superset-frontend/package.json#L45)
   
   The new custom extraction methods will be registered as setuptools 
[`entry_points` in 
`setup.py`](https://github.com/apache/superset/blob/master/setup.py#L59-L71), 
e.g:
   
   ```python
   "babel.extractors": [
       "superset_dashboards = superset.translations.utils:extract_dashboards",
       "superset_charts = superset.translations.utils:extract_charts",
       "superset_datasets = superset.translations.utils:extract_datasets",
   ]
   ```
   
   ### New dependencies
   
   None
   
   ### Migration Plan and Compatibility
   
   No database migrations or user-facing changes are required to support this 
change.
   
   ### Rejected Alternatives
   
   [SIP-60](https://github.com/apache/superset/issues/13442) proposed adding 
extra fields to each chart to store the translated text for each user-facing 
field, and a custom component for locating and displaying the translated field 
value. This approach was contested for its UI complexity, and because it 
[requires translators to have chart-level edit 
access](https://github.com/apache/superset/issues/13442#issuecomment-822589045).
   
   One respondent to SIP-60 also [suggested creating a dedicated database 
table](https://github.com/apache/superset/issues/13442#issuecomment-831356427) 
for i18n which could have its own UI and permissions granted for translators. 
Though this solution is simpler from a data perspective, it does not simplify 
the UI changes required to utilize these translations. It also would distance 
the translator from the context in which each translated term is used, making 
it difficult for translators to provide appropriate translations.
   
   Open edX currently [works around this 
issue](https://github.com/openedx/tutor-contrib-aspects/issues/36) by providing 
translated copies of master charts and dashboards for each supported language. 
This workaround mirrors [Tableau 's suggested 
solution](https://www.tableau.com/blog/how-create-multilingual-dashboards-52933),
 however, Open edX has found this approach to be difficult to maintain, 
especially in an environment where operators may provide their own custom 
charts and dashboards.
   
   #### Advantages of this proposal
   
   * No change required to the translators' current process
   * No additional application access/permissions required for translators
   * No user-visible UI/UX changes
   * Minimally invasive change to the codebase
   
   #### Disadvantages
   
   * Translators will need to operate on the extracted `.po` files instead of 
translating directly in the Superset UI.
      This issue can be mitigated using tools such as Transifex, which will 
[host open-source project translations for 
free](https://www.transifex.com/open-source/) and supports machine-generated 
translations like Google Translate.
   * Translations are heavily context-dependent, and so omitting a UI means 
that translators are providing translations outside of the Superset environment 
(as they are for all other Superset UI application strings).
      This issue will be mitigated by providing as much detail as possible in 
the [`.po` 
comments](https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html) 
generated for each term, so translators can understand how and where the terms 
are used.
   * Only a single translation can be provided for each term  in each language.
     For example, if we have a chart titled "Course Data" and a dashboard 
titled "Course Data", we can only translate "Course Data" once per language – 
the translation used will not know the context it was extracted from.
   * Translated field names will only be visible from the rendered 
[superset-frontend](https://github.com/apache/superset/tree/master/superset-frontend),
 not in the data returned by the backend Superset APIs.
      The Superset API is based on [Flask AppBuilder's REST 
API](https://flask-appbuilder.readthedocs.io/en/latest/rest_api.html), which 
supports translating labels, descriptions and filters using the `_I_` 
querystring parameter. However translating the actual returned API data is not 
part of this feature.  Addressing this issue requires a contribution to FAB. 
Details are outside of the scope of this proposal, but I'd love to discuss it 
if such a contribution would be welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Reply via email to