Last week I re-encountered the problems with using makemessages on Javascript files, and lost a couple of half-days to trying to figure out why some of my translatable messages weren't being found and deposited into my .po files. After fully understanding the extent of Django's current hack, I decided to take a stab at providing a better solution.

Background: today, Javascript source files are parsed for messages by running a "pythonize" regex over them, and giving the resulting text to xgettext, claiming it is Perl. The pythonize regex simply changes any //-style comment on its own line into a #-style comment. This strange accommodation leaves a great deal of valid Javascript syntax in place to confuse the Perl parser in xgettext. As a result, seemingly innocuous Javascript will result in lost messages:

   gettext("xyzzy 1");
   var x = y;
   gettext("xyzzy 2");
   var x = z;
   gettext("xyzzy 3");

In this sample, messages 1 and 3 are found, and message 2 is not, because y;ABC;abc; is valid Perl for a transliteration operator. Digging into this, every time I thought I finally understood the full complexity of the brokenness, another case would pop up that didn't make sense. The full horror of Perl syntax (http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators , for example) means that it is very difficult to treat non-Perl code as Perl and expect everything to be OK. This is polyglot programming at its worst.

This needs to be fixed. To that end, I've written a Javascript lexer (https://bitbucket.org/ned/jslex) with the goal of using it to pre-process Javascript into a form more suitable for xgettext. My understanding of why we claim Javascript is Perl is that Perl has regex literals like Javascript does, and so xgettext stands the best chance of parsing Javascript as Perl. Clearly that's not working well. My solution would instead remove the regex literals from the Javascript, and then have xgettext treat it as C.

I have a few questions you can help me with:

1. Is this the best path forward? Ideally xgettext would support Javascript directly. There's code out there to add Javascript to xgettext, but I don't know what shape that code is in, or if it's reasonable to expect Django installations to use bleeding-edge xgettext. Is there some better solution that someone is pursuing?

2. Is there some other badness that will bite us if we tell xgettext that the modified Javascript is C? With a full Javascript lexer, I feel pretty confident that we could solve issues if they do come up, but I'd like to know now what they are.

3. I know that lexing Javascript is tricky. I need help finding diabolical test cases for my lexer (https://bitbucket.org/ned/jslex). Anyone care to come up with some Javascript source that it can't properly find the regex literals in?

BTW: This would close tickets #7704, #14045, #15331, and #15495.

--Ned.

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to