I noted (yeah, with the eye on @sentinelpadding, I admit it ;-) that Leo executes lots of "custom" stuff in a tight loop on @thin node import. This could be done faster by using re.split() (leaving all the heavy-duty tokenization to C side), along the lines of what is done here:
http://simonwillison.net/2003/Oct/26/reSplit/ Basically, we would be constructing a list of "chunks" that are split by comment_character + @. We would then just iterate through that list, doing all the complex checking of sentinels there. Most of the chunks are not sentinel parts at all, so they can be passed on directly. The juicy part is this: QQQ Here’s the magic part though. If you put part or all of the regular expression in parenthesis the separating tokens get included in the resulting list: >>> splitter = re.compile('(<.>)') >>> splitter.split('hi<a>there<b>from<c>python') ['hi', '<a>', 'there', '<b>', 'from', '<c>', 'python'] QQQ I realize that this one is a delicate piece of code, and should not be messed around with casually. -- Ville M. Vainio http://tinyurl.com/vainio --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "leo-editor" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en -~----------~----~----~----~------~----~------~--~---
