pomegranited opened a new issue, #32139: URL: https://github.com/apache/superset/issues/32139
## [SIP] Proposal for translating Superset asset data using custom flask-babel extraction methods ### Motivation Superset provides translation support for built-in components in the UI. However, the Open edX project also needs the user-provided terms used in the assets themselves to be translatable, e.g. dashboard and chart title, axes labels, and metric labels. We also need these asset translations to be easily maintained between upgrades of Superset, and to be re-deployable when translations are updated. ### Proposed Change Superset uses [flask-babel](https://python-babel.github.io/flask-babel/) to extract and compile translations marked by the backend and frontend files. We propose adding new [custom extraction methods](https://babel.pocoo.org/en/latest/messages.html#referencing-extraction-methods) to pull out asset field values. These methods would be disabled by default, and enabled using a new feature flag, `SUPERSET_TRANSLATE_ASSETS`. Superset provides the backend translations to the frontend via the [template bootstrap data's language_pack field](https://github.com/apache/superset/blob/5fe6ef268ec5b136f911ccfe05fb8bdd7fc7a79f/superset/views/base.py#L336). Once user-provided values are translated, they are available to any frontend component strings wrapped in `@superset-ui.core.t()`. To make version control and conflict management easier, we propose splitting the asset messages into separate files from the upstream-maintained application translations used by Superset UI. The asset translations will be concatenated with the application message files before being compiled. Thus, these asset message files can be maintained per instance on a Superset fork. (Note: compiled message `.mo/.json` files are generated, so are [not version-controlled](https://github.com/apache/superset/blob/master/.gitignore#L111-L115).) #### Process 1. Configure the `SUPERSET_TRANSLATE_ASSETS` feature setting. 1. Run [babel_update.sh](https://github.com/apache/superset/blob/master/scripts/translations/babel_update.sh) to extract application and asset translations to (versioned) `.po` files. 1. Translators will manually update the `.po` files, or use a tool like [Transifex](https://www.transifex.com/open-source/) to provide translations in the desired languages. 1. Run `babel_update.sh –compile` to concatenate and compile the ([unversioned)](https://github.com/apache/superset/blob/master/.gitignore#L111-L115) message files consumed by the app. This process could be optionally run during the Docker image build (when [BUILD_TRANSLATIONS=true](https://github.com/apache/superset/blob/c7c3b1b0e99228f261415742469cd4a7f929da7b/Dockerfile#L131)), or via the command line on a deployed instance. Superset may need to be restarted to apply changes to translations on a running instance. #### Configuration: SUPERSET_TRANSLATE_ASSETS If the `SUPERSET_TRANSLATE_ASSETS` feature flag is not found in settings, or is Falsey, the custom extraction methods will exit immediately, so that users who do not need this feature are not burdened with the overhead of extracting asset translations. This feature flag could be a simple on/off boolean, or a more complex structure to include/exclude specific assets, depending on community feedback. #### Backend: extract asset field values Create custom asset extraction methods under [superset.translations.utils](https://github.com/apache/superset/blob/master/superset/translations/utils.py) which: * iterate over the assets using the Superset data APIs * pull out each [translatable asset field](#appendix-translatable-asset-fields) value as the "message" * yield a tuple for each translatable field in the asset, containing: * *lineno*: (generate something reasonable here) * *funcname*: e.g `asset_<type>_<field_name>_<uuid>` * *message*: value of asset field * *comments*: generate a comment for translators to describe which asset this field is from, and where this field is used #### Frontend: use translations Superset provides translations to the frontend via the [template bootstrap data's `language_pack` field](https://github.com/apache/superset/blob/5fe6ef268ec5b136f911ccfe05fb8bdd7fc7a79f/superset/views/base.py#L336). Once user-provided values are translated, they are available to the frontend components via `@superset-ui.core.t()`, e.g ```typescript import { t } from '@superset-ui/core'; … // Where the dashboard_title variable is shown to user: {t(dashboard_title)} ``` #### Appendix: Translatable asset fields We've identified the following asset fields as needing translations. ##### Dashboard fields * `dashboard_title` * `description` * `metadata.native_filter_configuration.name` * `metadata.native_filter_configuration.description` * `position.*.meta.text` * `position.*.meta.code` * `position.*.meta.sliceNameOverride` Notes: * `position.*.meta` fields denote positional elements in the Dashboard, e.g charts, headings, and markdown text. ##### Chart fields * `slice_name` * `description` * `params.x_axis_label` * `params.y_axis_label` * `params.groupby.label` ##### Dataset fields * `metrics.verbose_name` * `columns.verbose_name` ### New or Changed Public Interfaces We propose updating the command-line script [babel_update.sh](https://github.com/apache/superset/blob/master/scripts/translations/babel_update.sh) to: * Preserve [messages.pot](https://github.com/apache/superset/blob/master/superset/translations/messages.pot) / [messages.po](https://github.com/apache/superset/blob/master/superset/translations/ar/LC_MESSAGES/messages.po) files for application messages. * Generate `assets.po` files for asset messages. Upstream versions will be empty; forks can maintain their own versions. * Add a `compile` argument which: * Concatenates `messages.po` + `assets.po` to an git-ignored file, `superset.po` * Compiles each language's `.mo` using [pybabel compile](https://github.com/apache/superset/blob/c7c3b1b0e99228f261415742469cd4a7f929da7b/Dockerfile#L132) * Rebuilds the frontend `.json` files using [npm run build-translations](https://github.com/apache/superset/blob/5fe6ef268ec5b136f911ccfe05fb8bdd7fc7a79f/superset-frontend/package.json#L45) The new custom extraction methods will be registered as setuptools [`entry_points` in `setup.py`](https://github.com/apache/superset/blob/master/setup.py#L59-L71), e.g: ```python "babel.extractors": [ "superset_dashboards = superset.translations.utils:extract_dashboards", "superset_charts = superset.translations.utils:extract_charts", "superset_datasets = superset.translations.utils:extract_datasets", ] ``` ### New dependencies None ### Migration Plan and Compatibility No database migrations or user-facing changes are required to support this change. ### Rejected Alternatives [SIP-60](https://github.com/apache/superset/issues/13442) proposed adding extra fields to each chart to store the translated text for each user-facing field, and a custom component for locating and displaying the translated field value. This approach was contested for its UI complexity, and because it [requires translators to have chart-level edit access](https://github.com/apache/superset/issues/13442#issuecomment-822589045). One respondent to SIP-60 also [suggested creating a dedicated database table](https://github.com/apache/superset/issues/13442#issuecomment-831356427) for i18n which could have its own UI and permissions granted for translators. Though this solution is simpler from a data perspective, it does not simplify the UI changes required to utilize these translations. It also would distance the translator from the context in which each translated term is used, making it difficult for translators to provide appropriate translations. Open edX currently [works around this issue](https://github.com/openedx/tutor-contrib-aspects/issues/36) by providing translated copies of master charts and dashboards for each supported language. This workaround mirrors [Tableau 's suggested solution](https://www.tableau.com/blog/how-create-multilingual-dashboards-52933), however, Open edX has found this approach to be difficult to maintain, especially in an environment where operators may provide their own custom charts and dashboards. #### Advantages of this proposal * No change required to the translators' current process * No additional application access/permissions required for translators * No user-visible UI/UX changes * Minimally invasive change to the codebase #### Disadvantages * Translators will need to operate on the extracted `.po` files instead of translating directly in the Superset UI. This issue can be mitigated using tools such as Transifex, which will [host open-source project translations for free](https://www.transifex.com/open-source/) and supports machine-generated translations like Google Translate. * Translations are heavily context-dependent, and so omitting a UI means that translators are providing translations outside of the Superset environment (as they are for all other Superset UI application strings). This issue will be mitigated by providing as much detail as possible in the [`.po` comments](https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html) generated for each term, so translators can understand how and where the terms are used. * Only a single translation can be provided for each term in each language. For example, if we have a chart titled "Course Data" and a dashboard titled "Course Data", we can only translate "Course Data" once per language – the translation used will not know the context it was extracted from. * Translated field names will only be visible from the rendered [superset-frontend](https://github.com/apache/superset/tree/master/superset-frontend), not in the data returned by the backend Superset APIs. The Superset API is based on [Flask AppBuilder's REST API](https://flask-appbuilder.readthedocs.io/en/latest/rest_api.html), which supports translating labels, descriptions and filters using the `_I_` querystring parameter. However translating the actual returned API data is not part of this feature. Addressing this issue requires a contribution to FAB. Details are outside of the scope of this proposal, but I'd love to discuss it if such a contribution would be welcome. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For additional commands, e-mail: notifications-h...@superset.apache.org