Daniel Elstner
Sun, 12 Aug 2007 19:22:50 -0700
Hi everyone, I went ahead and committed a preliminary implementation of the message compose API that was proposed a while ago. This new feature makes it much easier to properly localize gtkmm applications. The feature status is tracked on bugzilla, too:
http://bugzilla.gnome.org/show_bug.cgi?id=399216 What this is all about ====================== The classic approach to formatting a user-visible message string, without resorting to snprintf() and the like, goes something like this: std::ostringstream output; output.imbue(std::locale("")); output << percentage << _("% done"); label->set_text(Glib::locale_to_utf8(output.str())); This is a ridiculous amount of code to generate a mundane "50% done" label, but still misses out on a couple of corner cases. Even more important is that the STL stream interface makes proper localization impossible. The message should be passed to gettext() in one piece in order to provide the required context, and to allow the translator to rearrange the sentence structure as required by the target language. One way to do that is to use a template string with placeholders for separate arguments, like printf() in the C library: char* message = g_strdup_printf(_("%d%% done"), percentage); gtk_label_set_text(GTK_LABEL(label), message); g_free(message); This approach lacks the type safety and exception safety of the STL stream version. Nonetheless, it's a shame that using C++ and glibmm requires more code than plain C and GLib. This is where the new message compose and format API comes into play. Using the proposed API, the example from above looks like this: using Glib::ustring; label->set_text(ustring::compose(_("%1%% done"), ustring::format(percentage))); A couple of more interesting use cases: ustring s; const double a = 3456.78; const double b = 7890.12; s = ustring::compose("%1 is lower than %2.", ustring::format(a), ustring::format(b)); s = ustring::compose("%2 is greater than %1.", ustring::format(a), ustring::format(b)); s = ustring::compose("%1 € are %3 %% of %2 €.", ustring::format(a), ustring::format(b), ustring::format(std::fixed, std::setprecision(1), a / b * 100.0)); In a German locale the three composed strings are: 3.456,78 is lower than 7.890,12. 7.890,12 is greater than 3.456,78. 3.456,78 € are 43,8 % of 7.890,12 €. The complete example program demonstrating these use cases can be found in the examples/compose directory in the glibmm SVN repository. Some explanation of the details is in order. * Unlike with printf(), the compose and format functionality are provided separately. This is the main difference to Ole Laursen's compose mini-library. I'll get to that later. * Placeholders in the template string are in qt-format: A percent symbol followed by a single digit denoting the index of the argument to substitute, i.e. "%1", "%2", ..., "%9". Two percent symbols "%%" result in a single "%" in the output. Thus the maximum number of arguments is nine, which is probably not a real limit in practice. Placeholders can occur in any order in the template string, which allows the translator to reorder substitutions freely. Note that qt-format is recognized and fully supported by gettext. * The arguments to ustring::compose() are all of type Glib::ustring. To format a number into a string ustring::format() must be used. The arguments to format() are written sequentially to an output string stream and can thus be of any streamable type, including I/O manipulators. * Wide-character streams are used internally to enable fully-fledged internationalization support. Using wchar_t streams avoids restricting the formatting results to either ASCII or at best the narrow locale codeset. For instance, the thousands separator can be a code point outside the ASCII range in some languages. The use of wchar_t streams also allows skipping iconv() on modern Linux and Windows system. Alternative API =============== With Ole Laursen's compose API, the placeholder substitution and string formatting functionality are available through a single function. This design has the advantage of brevity: ustring s = ustring::compose("%1 € are %3 %% of %2 €.", a, b, std::fixed, std::setprecision(1), a / b * 100.0); This cuts down on the nesting of parentheses, which is definitely a good thing. However, I have some misgivings with this solution. The main problem is that there is no longer a clear correspondence of placeholder index to argument position, since I/O manipulators have to be skipped. Unfortunately there's no portable way to detect whether an object is an I/O manipulator. Thus a heuristic is used -- if passing the argument through the I/O stream yields an empty string, it is assumed to be a manipulator instead of a real argument. Obviously the heuristic breaks down if an argument is really meant to be an empty string. While this would rarely be an issue for user-visible messages, it feels like an arbitrary and likely unexpected restriction. And possible use cases do exist, like the following example which is only possible with separate compose and format steps: const double a = 3456.78; const double b = 7890.12; const int i = int(a / (a + b) * 40.0); std::cout << ustring::compose("a : b = [%1|%2]", ustring::format(std::setfill(L'a'), std::setw(i), ""), ustring::format(std::setfill(L'b'), std::setw(40 - i), "")); The output is a fancy ASCII art diagram: a : b = [aaaaaaaaaaaa|bbbbbbbbbbbbbbbbbbbbbbbbbbbb] This is of course a somewhat silly example, but it shows that empty arguments aren't entirely unreasonable. Furthermore, string arguments might end up empty as a result of unanticipated run-time behavior. ustring s = "abc"; s = ustring::compose("Length of \"%1\" is %2", s, s.length()); ==> Length of "abc" is 3 ustring s = ""; s = ustring::compose("Length of \"%1\" is %2", s, s.length()); ==> Length of "0" is Oops. This particular problem could be avoided by specializing for string types but it quickly becomes awkward. The empty string might be a perfectly valid result of streaming an object of user-defined type. So it comes down to a trade-off. The combined API has brevity and ease of use going for it. On the other hand the separated API is more robust and can be implemented more cleanly. What do you think? --Daniel _______________________________________________ gtkmm-list mailing list gtkmm-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtkmm-list