Last week I re-encountered the problems with using makemessages on
Javascript files, and lost a couple of half-days to trying to figure out
why some of my translatable messages weren't being found and deposited
into my .po files. After fully understanding the extent of Django's
current hack, I decided to take a stab at providing a better solution.
Background: today, Javascript source files are parsed for messages by
running a "pythonize" regex over them, and giving the resulting text to
xgettext, claiming it is Perl. The pythonize regex simply changes any
//-style comment on its own line into a #-style comment. This strange
accommodation leaves a great deal of valid Javascript syntax in place to
confuse the Perl parser in xgettext. As a result, seemingly innocuous
Javascript will result in lost messages:
gettext("xyzzy 1");
var x = y;
gettext("xyzzy 2");
var x = z;
gettext("xyzzy 3");
In this sample, messages 1 and 3 are found, and message 2 is not,
because y;ABC;abc; is valid Perl for a transliteration operator.
Digging into this, every time I thought I finally understood the full
complexity of the brokenness, another case would pop up that didn't make
sense. The full horror of Perl syntax
(http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators ,
for example) means that it is very difficult to treat non-Perl code as
Perl and expect everything to be OK. This is polyglot programming at
its worst.
This needs to be fixed. To that end, I've written a Javascript lexer
(https://bitbucket.org/ned/jslex) with the goal of using it to
pre-process Javascript into a form more suitable for xgettext. My
understanding of why we claim Javascript is Perl is that Perl has regex
literals like Javascript does, and so xgettext stands the best chance of
parsing Javascript as Perl. Clearly that's not working well. My
solution would instead remove the regex literals from the Javascript,
and then have xgettext treat it as C.
I have a few questions you can help me with:
1. Is this the best path forward? Ideally xgettext would support
Javascript directly. There's code out there to add Javascript to
xgettext, but I don't know what shape that code is in, or if it's
reasonable to expect Django installations to use bleeding-edge
xgettext. Is there some better solution that someone is pursuing?
2. Is there some other badness that will bite us if we tell xgettext
that the modified Javascript is C? With a full Javascript lexer, I feel
pretty confident that we could solve issues if they do come up, but I'd
like to know now what they are.
3. I know that lexing Javascript is tricky. I need help finding
diabolical test cases for my lexer (https://bitbucket.org/ned/jslex).
Anyone care to come up with some Javascript source that it can't
properly find the regex literals in?
BTW: This would close tickets #7704, #14045, #15331, and #15495.
--Ned.
--
You received this message because you are subscribed to the Google Groups "Django
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/django-developers?hl=en.