[elixir-core:11691] Performance of regular expression matches

Jan Krüger Thu, 14 Mar 2024 06:20:33 -0700

I've recently had to work on a code base that parses largish RDF XML files. 
Part of the code base does relatively simple but regular expression 
matches, but since the files are large, quite a lot of Regex.run calls. 
While profiling I've noticed, that there are callouts to 
:erlang.system_info, which fetches the PCRE version BEAM was compiled 
against.

An example regular expression from the code base in question matches the
schema part of a URL. I've replaced Regex.run with erlang's :re.run for
testing purposes, and at least for this case, there performance gain is
quite dramatic.

Comparing fprof results:

```
RDF.IRI.scheme/1 1176473
30615.618 2354.355
---
RDF.IRI.scheme/1 1176473
3531.955 2353.905
```

I found this thread in the google group, which actually talk about the
reasoning for fetching the version, and proposes and alternative.

https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1

Especially

```
Taking a further look at the code, the issue with recompiling regexes on
the fly is that it makes executing the regexes more expensive, as we need
to compute the version on every execution. We could store the version in
ETS but that would have performance issues. Storing in a persistent_term
would be great, but at the moment we support Erlang/OTP 20+. Thoughts?
```

Since this has a fairly noticeable impact, at least on all tests I've run,
I wanted to start a discussion, if this could be implemented/improved now.

--
You received this message because you are subscribed to the Google Groups
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com.

[elixir-core:11691] Performance of regular expression matches

Reply via email to