Hi all,
I'm Ian, one of the two students working on improving the regexp
engine in Vim for this year's Google Summer of Code.  I haven't had a
whole lot to contribute as of yet, but now that work is underway, I'll
probably pop up here asking lots of questions some days.

Right now we're working on getting things set up and building a
testing suite, but I thought I would spark some discussion on a design
decision that will be coming up after we finish this phase, which is
whether to implement the new model ourselves, or use an alternative
engine, like TRE: <http://laurikari.net/tre/>. I'm tempted to
implement one ourselves, as it's an intellectually stimulating
prospect, but that doesn't mean I won't listen to reason if TRE or
another option is far better. I don't know much about the internals of
TRE, but according to previous posts to this list, it utilizes three
engines: a slow one for handling backreferences (presumably similar to
Vim's current engine), a fast one for most cases (what we are looking
to implement), and one for their 'fuzzy matching' feature.

I have a couple questions to start things off. First: I couldn't see
much need for 'fuzzy matching' in Vim, but some of you are probably
much better acquainted with regexp use cases than I am.  Would this be
a useful feature to have available?  Second: We might have to do some
gymnastics to work with multibyte characters, as discussed here: <
http://tech.groups.yahoo.com/group/vimdev/message/46408>. I haven't
worked with multibyte characters before, so I'm not clear on the
subtleties.  Would this translation to wide characters before passing
to the engine cause much of a performance hit and/or be excessively
complicated to implement? On a side note, TRE's main page says it has
both wide character and multibyte character support. I couldn't find a
version history, so I'm not sure if this is a new feature that Nikolai
isn't aware of, or if we need something more.

I'm interested to hear what you all have to say. We don't need to make
this decision until middle of next week at the earliest, but I thought
I would get the discussion going now.

Ian

Reply via email to