Fixing makemessages for Javascript

Ned Batchelder Mon, 04 Apr 2011 14:15:38 -0700

Last week I re-encountered the problems with using makemessages onJavascript files, and lost a couple of half-days to trying to figure outwhy some of my translatable messages weren't being found and depositedinto my .po files. After fully understanding the extent of Django'scurrent hack, I decided to take a stab at providing a better solution.

Background: today, Javascript source files are parsed for messages byrunning a "pythonize" regex over them, and giving the resulting text toxgettext, claiming it is Perl. The pythonize regex simply changes any//-style comment on its own line into a #-style comment. This strangeaccommodation leaves a great deal of valid Javascript syntax in place toconfuse the Perl parser in xgettext. As a result, seemingly innocuousJavascript will result in lost messages:


   gettext("xyzzy 1");
   var x = y;
   gettext("xyzzy 2");
   var x = z;
   gettext("xyzzy 3");

In this sample, messages 1 and 3 are found, and message 2 is not,because y;ABC;abc; is valid Perl for a transliteration operator.Digging into this, every time I thought I finally understood the fullcomplexity of the brokenness, another case would pop up that didn't makesense. The full horror of Perl syntax(http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators ,for example) means that it is very difficult to treat non-Perl code asPerl and expect everything to be OK. This is polyglot programming atits worst.

This needs to be fixed. To that end, I've written a Javascript lexer(https://bitbucket.org/ned/jslex) with the goal of using it topre-process Javascript into a form more suitable for xgettext. Myunderstanding of why we claim Javascript is Perl is that Perl has regexliterals like Javascript does, and so xgettext stands the best chance ofparsing Javascript as Perl. Clearly that's not working well. Mysolution would instead remove the regex literals from the Javascript,and then have xgettext treat it as C.


I have a few questions you can help me with:

1. Is this the best path forward? Ideally xgettext would supportJavascript directly. There's code out there to add Javascript toxgettext, but I don't know what shape that code is in, or if it'sreasonable to expect Django installations to use bleeding-edgexgettext. Is there some better solution that someone is pursuing?

2. Is there some other badness that will bite us if we tell xgettextthat the modified Javascript is C? With a full Javascript lexer, I feelpretty confident that we could solve issues if they do come up, but I'dlike to know now what they are.

3. I know that lexing Javascript is tricky. I need help findingdiabolical test cases for my lexer (https://bitbucket.org/ned/jslex).Anyone care to come up with some Javascript source that it can'tproperly find the regex literals in?


BTW: This would close tickets #7704, #14045, #15331, and #15495.

--Ned.

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Fixing makemessages for Javascript

Reply via email to