On 20/10/10 02:40, Stephen Kelly wrote:
Sorry. Sent too early. All thumbs today. Consider these examples:

{% verbatim "%} %}" %}

(That is, "%} %}" in a verbatim-no-end tag)

{% verbatim %} %} %} {% endverbatim %}

(That is, " %} %} " wrapped in verbatim tags)

The current lexer uses regexps to find tokens like that. It would need to be
completely rewritten/redesigned to handle these cases.

All the best,

Steve.


Are you sure? There's no nesting here, so I'm reasonably sure this could be a regular language, though I don't want to sit down and prove that.

Instead, as an engineer, a regex that can distinguish the two:

In [38]: re.split(re.compile(r'{% \s* ( (?: [\w\-_\s]+ ) (?: \s* \" [\w\-_\s%}{]+ \" \s*)* ) \s* %}', re.VERBOSE), '{% foo "%}" %} {% endfoo %}')
Out[38]: ['', 'foo "%}" ', ' ', 'endfoo ', '']

In [39]: re.split(re.compile(r'{% \s* ( (?: [\w\-_\s]+ ) (?: \s* \" [\w\-_\s%}{]+ \" \s*)* ) \s* %}', re.VERBOSE), '{% foo %} %} {% endfoo %}')
Out[39]: ['', 'foo ', ' %} ', 'endfoo ', '']

(the key here is asserting the even number of quote marks, something a regular language is capable of expressing)

It's a bit early in the morning for in-depth regexes, but that seems to show that it is _probably_ possible. Whether we should be continuing to use the regex-based parser or moving to a proper lexing/tokenising one is a different question, but if we did a parser rewrite it wouldn't be able to land until 1.4 now, I imagine.

Andrew

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to