On 20/10/10 02:40, Stephen Kelly wrote:
Sorry. Sent too early. All thumbs today. Consider these examples:
{% verbatim "%} %}" %}
(That is, "%} %}" in a verbatim-no-end tag)
{% verbatim %} %} %} {% endverbatim %}
(That is, " %} %} " wrapped in verbatim tags)
The current lexer uses regexps to find tokens like that. It would need to be
completely rewritten/redesigned to handle these cases.
All the best,
Steve.
Are you sure? There's no nesting here, so I'm reasonably sure this could
be a regular language, though I don't want to sit down and prove that.
Instead, as an engineer, a regex that can distinguish the two:
In [38]: re.split(re.compile(r'{% \s* ( (?: [\w\-_\s]+ ) (?: \s* \"
[\w\-_\s%}{]+ \" \s*)* ) \s* %}', re.VERBOSE), '{% foo "%}" %} {% endfoo
%}')
Out[38]: ['', 'foo "%}" ', ' ', 'endfoo ', '']
In [39]: re.split(re.compile(r'{% \s* ( (?: [\w\-_\s]+ ) (?: \s* \"
[\w\-_\s%}{]+ \" \s*)* ) \s* %}', re.VERBOSE), '{% foo %} %} {% endfoo %}')
Out[39]: ['', 'foo ', ' %} ', 'endfoo ', '']
(the key here is asserting the even number of quote marks, something a
regular language is capable of expressing)
It's a bit early in the morning for in-depth regexes, but that seems to
show that it is _probably_ possible. Whether we should be continuing to
use the regex-based parser or moving to a proper lexing/tokenising one
is a different question, but if we did a parser rewrite it wouldn't be
able to land until 1.4 now, I imagine.
Andrew
--
You received this message because you are subscribed to the Google Groups "Django
developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/django-developers?hl=en.