I've recently had to work on a code base that parses largish RDF XML files. Part of the code base does relatively simple but regular expression matches, but since the files are large, quite a lot of Regex.run calls. While profiling I've noticed, that there are callouts to :erlang.system_info, which fetches the PCRE version BEAM was compiled against.
An example regular expression from the code base in question matches the schema part of a URL. I've replaced Regex.run with erlang's :re.run for testing purposes, and at least for this case, there performance gain is quite dramatic. Comparing fprof results: ``` RDF.IRI.scheme/1 1176473 30615.618 2354.355 --- RDF.IRI.scheme/1 1176473 3531.955 2353.905 ``` I found this thread in the google group, which actually talk about the reasoning for fetching the version, and proposes and alternative. https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1 Especially ``` Taking a further look at the code, the issue with recompiling regexes on the fly is that it makes executing the regexes more expensive, as we need to compute the version on every execution. We could store the version in ETS but that would have performance issues. Storing in a persistent_term would be great, but at the moment we support Erlang/OTP 20+. Thoughts? ``` Since this has a fairly noticeable impact, at least on all tests I've run, I wanted to start a discussion, if this could be implemented/improved now. -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com.