On Mon, Oct 5, 2009 at 6:18 PM, Trent W. Buck <[email protected]>wrote:
> Petr Rockai <[email protected]> writes: > > > Jason Dagit <[email protected]> writes: > >> In my travels profiling the performance of record I noticed that we > >> do spend about 1/3 of the time just matching regular expressions on > >> filenames. > > Just one thing... do we match those on String or on ByteString? > > In principle, I would like to match Unicode codepoints, not bytes. > On OS X, man regex give these two definitions: int regcomp(regex_t *restrict preg, const char *restrict pattern, int cflags); int regexec(const regex_t *restrict preg, const char *restrict string, size_t nmatch, regmatch_t pmatch[restrict], int eflags); So, both regcomp and regexec take vectors of bytes. If a wchar version exists then I don't think the Haskell bindings are using them. I think as long as you're lucky enough that the regex and string are in the same encoding then ByteString and String will be equivalent in their matching ability here. Unfortunately I don't think darcs makes an such guarantees. > > In practice, I avoid non-ASCII and non-printable characters in file > names, because there are so many such issues on Unix :-( > > > Because not using String would probably lead to another substantial > > speedup on this. We may also want to switch to regex-dfa, since I > > believe we only care whether we have a match and not much else. > > I see no downside there. > I was going to agree that we don't need the extra capabilities like extracting matches and doing replaces but it just occurred to me that we could probably re-implement some things, like decode_white/encode_white using regexps and potentially get better performance. It's worth doing performance test to see. I'll try to get some data on this. > > > But you are right that regex-pcre or pcre-light might be faster > > (before deciding, it may make a lot of sense to benchmark both in > > darcs, though). > > I have no problem switching from EREs to PCREs, but if we do so, please > lets do it for all of Darcs at once! > Agreed. > > As well as benchmarking, someone will need to check that the default > regexps that Darcs HAS shipped will have the same semantics after > switching to PCRE. > If it comes to this, do you think you would know how to determine this? I'd have to do a bit of research to figure it out myself. Jason
_______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
