True, but reading the regex module code, it seems like the call to 
:re.version would always get made, regardless of which elixir/erlang 
version is in use? I'm quite confident that in all our environments the 
elixir and OTP versions match, so it's not that Regex.run is slow because 
it ends up in the code path that doesn't use the compiled regular 
expression. The overhead definitely came from the call to :re.version. At 
least for small regexes.

Anyway. Thanks for fixing it. We'll try it out, as soon as it's available 
in the next release.

On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote:

> I have pushed a fix to main. But also note we provide precompiled Elixir 
> versions per OTP version. Using a matching version will always give you the 
> best results and that's not only about regexes. :)
>
> On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> wrote:
>
>> I've recently had to work on a code base that parses largish RDF XML 
>> files. Part of the code base does relatively simple but regular expression 
>> matches, but since the files are large, quite a lot of Regex.run calls. 
>> While profiling I've noticed, that there are callouts to 
>> :erlang.system_info, which fetches the PCRE version BEAM was compiled 
>> against.
>>
>> An example regular expression from the code base in question matches the 
>> schema part of a URL. I've replaced Regex.run with erlang's :re.run for 
>> testing purposes, and at least for this case, there performance gain is 
>> quite dramatic.
>>
>> Comparing fprof results:
>>
>> ```
>> RDF.IRI.scheme/1                                               1176473   
>> 30615.618    2354.355
>> ---
>> RDF.IRI.scheme/1                                               1176473   
>>  3531.955    2353.905
>> ```
>>
>> I found this thread in the google group, which actually talk about the 
>> reasoning for fetching the version, and proposes and alternative.
>>
>>
>> https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1
>>
>> Especially
>>
>> ```
>> Taking a further look at the code, the issue with recompiling regexes on 
>> the fly is that it makes executing the regexes more expensive, as we need 
>> to compute the version on every execution. We could store the version in 
>> ETS but that would have performance issues. Storing in a persistent_term 
>> would be great, but at the moment we support Erlang/OTP 20+. Thoughts?
>> ```
>>
>> Since this has a fairly noticeable impact, at least on all tests I've 
>> run, I wanted to start a discussion, if this could be implemented/improved 
>> now.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elixir-lang-co...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/ff9ae7a9-f6d3-476a-a677-b30377cc6c8an%40googlegroups.com.

Reply via email to