I've recently had to work on a code base that parses largish RDF XML files. 
Part of the code base does relatively simple but regular expression 
matches, but since the files are large, quite a lot of Regex.run calls. 
While profiling I've noticed, that there are callouts to 
:erlang.system_info, which fetches the PCRE version BEAM was compiled 
against.

An example regular expression from the code base in question matches the 
schema part of a URL. I've replaced Regex.run with erlang's :re.run for 
testing purposes, and at least for this case, there performance gain is 
quite dramatic.

Comparing fprof results:

```
RDF.IRI.scheme/1                                               1176473   
30615.618    2354.355
---
RDF.IRI.scheme/1                                               1176473   
 3531.955    2353.905
```

I found this thread in the google group, which actually talk about the 
reasoning for fetching the version, and proposes and alternative.

https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1

Especially

```
Taking a further look at the code, the issue with recompiling regexes on 
the fly is that it makes executing the regexes more expensive, as we need 
to compute the version on every execution. We could store the version in 
ETS but that would have performance issues. Storing in a persistent_term 
would be great, but at the moment we support Erlang/OTP 20+. Thoughts?
```

Since this has a fairly noticeable impact, at least on all tests I've run, 
I wanted to start a discussion, if this could be implemented/improved now.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com.

Reply via email to