readAtFile optimization idea

Ville M. Vainio Tue, 24 Mar 2009 11:23:19 -0700

I noted (yeah, with the eye on @sentinelpadding, I admit it ;-) that
Leo executes lots of "custom" stuff in a tight loop on @thin node
import. This could be done faster by using re.split() (leaving all the
heavy-duty tokenization to C side), along the lines of what is done
here:


http://simonwillison.net/2003/Oct/26/reSplit/

Basically, we would be constructing a list of "chunks" that are split
by comment_character + @. We would then just iterate through that
list,  doing all the complex  checking of sentinels there. Most of the
chunks are not sentinel parts at all, so they can be passed on
directly.

The juicy part is this:

QQQ

Here’s the magic part though. If you put part or all of the regular
expression in parenthesis the separating tokens get included in the
resulting list:

>>> splitter = re.compile('(<.>)')
>>> splitter.split('hi<a>there<b>from<c>python')
['hi', '<a>', 'there', '<b>', 'from', '<c>', 'python']

QQQ

I realize that this one is a delicate piece of code, and should not be
messed around with casually.

-- 
Ville M. Vainio
http://tinyurl.com/vainio

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en
-~----------~----~----~----~------~----~------~--~---

readAtFile optimization idea

Reply via email to