On Wed, Mar 06, 2002 at 09:45:18AM -0800, Rich Morin wrote:
> Going a bit further afield, I also started thinking about the general
> nature of printf.  I've been using this basic syntax since 1970 (in
> the form of Fortran's FORMAT statements :-), so I'm pretty comfortable
> with it.  OTOH, I don't like the fact that the format specifications
> can become widely separated from the variables they reference.

I agree that this is a horribly "illiterate" aspect of printf....

> With all of Larry's talk about making "x" mode the standard in REs and
> having more "pair-based" syntax here and there, I started thinking
> about a replacement for printf, as:
> 
>   printx(
>     'The value of $foo is %f7.3; ',  $foo,
>     'the value of $bar is %f7.3.%n', $bar
>   );

.... But why propose such an off-the-wall solution?  Wouldn't it make
more sense to make it more like interpolation?  Eg,

    printx 'The value of $foo is %f7.3{foo}', { foo => $foo };
                                       ^^^ key of the following hash

Tangent:  One crucial but often overlooked aspect of designing
"format string" schemes is that they can, with some care, facilitate
internationalization.  C format strings are actually pretty good for
this, due to the following characteristics:

    - The translator doesn't have to touch code, only format
      strings.  This is obviously desirable.
    - C format strings are fairly "safe", in that a format string
      isn't likely to break a program.  This is far from strictly
      true, however, due to things like %n and the ability to access
      more arguments than are passed, which is undefined in C.  This
      might or might not be exploitable if your translator is
      malicious!
    - C format strings give the translator reasonable flexibility:
      Eg, they can reorder, repeat, or omit placeholders with the
      %m$ syntax.
    - Some localization is be "automagic", eg number formatting
      punctuation.

However, if you don't keep internationalization in mind, it is easy
to lose these characteristics.  For example, your proposal seems to
encourage

    printx
        'Your little dog %s ', $dog,
        'attacks the evil %s.', $monster;

In this example, the translator is unable to change the sentence
structure (unless he can change the code).

So I would humbly advise anyone thinking about this to

    - think safe.  Don't add features like the ability to execute
      arbitrary Perl expressions!  (Or at least, offer a version
      without unsafe features, and recommend that programmers use it
      in most cases.)
    - think flexibility for the translator.  This can be hard if you
      don't have linguisting or localization experience, but you can
      use your imagination.  Desirable features might include
      locale-sensitive formatting of dates, currencies, etc;
      handling of plurals (gettext has a neat solution, though it
      requires multiple format strings); an "internationalized
      string" type wrapping up format string plus placeholders[1].
      The Java and C# string formatting libraries are worth looking
      at (don't take this as high praise though).
    - make sure that a translator doesn't have to change anything
      except the format string.

Andrew

[1]  This is my pet idea.  Tell me if you see it somewhere!

Reply via email to