I quickly checked how a persistent term cached implementation would compare, which turned out to perform almost equivalent. It seems the :re.version and :erlang.system_info(:endian) values are already cached.
```elixir defmodule RegexPersistent do def version do case :persistent_term.get(__MODULE__, nil) do nil -> version = {:re.version(), :erlang.system_info(:endian)} :persistent_term.put(__MODULE__, version) version version -> version end end defp safe_run( %Regex{re_pattern: compiled, source: source, re_version: version, opts: compile_opts}, string, options ) do case version() do ^version -> :re.run(string, compiled, options) _ -> :re.run(string, source, translate_options(compile_opts, options)) end end def run(%Regex{} = regex, string, options \\ []) when is_binary(string) do return = Keyword.get(options, :return, :binary) captures = Keyword.get(options, :capture, :all) offset = Keyword.get(options, :offset, 0) case safe_run(regex, string, [{:capture, captures, return}, {:offset, offset}]) do :nomatch -> nil :match -> [] {:match, results} -> results end end defp translate_options(<<?u, t::binary>>, acc), do: translate_options(t, [:unicode, :ucp | acc]) defp translate_options(<<?i, t::binary>>, acc), do: translate_options(t, [:caseless | acc]) defp translate_options(<<?x, t::binary>>, acc), do: translate_options(t, [:extended | acc]) defp translate_options(<<?f, t::binary>>, acc), do: translate_options(t, [:firstline | acc]) defp translate_options(<<?U, t::binary>>, acc), do: translate_options(t, [:ungreedy | acc]) defp translate_options(<<?s, t::binary>>, acc), do: translate_options(t, [:dotall, {:newline, :anycrlf} | acc]) defp translate_options(<<?m, t::binary>>, acc), do: translate_options(t, [:multiline | acc]) defp translate_options(<<?r, t::binary>>, acc) do IO.warn("the /r modifier in regular expressions is deprecated, please use /U instead") translate_options(t, [:ungreedy | acc]) end defp translate_options(<<>>, acc), do: acc defp translate_options(rest, _acc), do: {:error, rest} end regex = ~r/^([a-z][a-z0-9\+\-\.]*):/i re_regex = regex.re_pattern Benchee.run(%{ "Regex.run/2" => fn -> Regex.run(regex, "foo") end, "RegexPersistent.run/2" => fn -> RegexPersistent.run(regex, "foo") end, ":re.run/3" => fn -> :re.run("foo", re_regex, [{:capture, :all, :binary}]) end }) ``` Results: ``` :re.run/3 2.72 M 367.06 ns ±3579.21% 333 ns 458 ns Regex.run/2 1.79 M 557.84 ns ±5817.83% 417 ns 542 ns RegexPersistent.run/2 1.79 M 558.06 ns ±7018.25% 375 ns 541 ns Comparison: :re.run/3 2.72 M Regex.run/2 1.79 M - 1.52x slower +190.77 ns RegexPersistent.run/2 1.79 M - 1.52x slower +190.99 ns ``` On Friday 15 March 2024 at 07:58:12 UTC+1 manish...@brsoftech.org wrote: > How Machine Learning Services Help Business? > <https://www.brsoftech.com/machine-learning-solutions.html> > > - With Machine Learning consulting services businesses can consider > cost reduction while boosting performance. > - It helps organizations to timely finish the task with utmost > accuracy. > - Retrieve information using cutting edge software tools. > - Machine learning works according to recent trends and specifications. > - It automates the analysis of past patterns and historical data to > predict the future. > > > On Fri, Mar 15, 2024 at 12:23 PM 'marcel...@googlemail.com' via > elixir-lang-core <elixir-l...@googlegroups.com> wrote: > >> The benchmark results I'm getting are indeed not as dramatic as the fprof >> results, but on the other hand also more than the 5% mentioned in the PR >> which introduced the check: >> https://github.com/elixir-lang/elixir/pull/9040 >> >> ```elixir >> regex = ~r/^([a-z][a-z0-9\+\-\.]*):/i >> re_pattern = regex.re_pattern >> >> Benchee.run(%{ >> "Regex.run/2" => fn -> Regex.run(regex, "foo") end, >> ":re.run/3" => fn -> :re.run("foo", re_pattern, [{:capture, :all, >> :binary}]) end >> }) >> ``` >> >> ``` >> Name ips average deviation median >> 99th % >> :re.run/3 2.88 M 346.90 ns ±3623.51% 333 ns >> 417 ns >> Regex.run/2 1.98 M 504.74 ns ±5851.21% 416 ns >> 542 ns >> >> Comparison: >> :re.run/3 2.88 M >> Regex.run/2 1.98 M - 1.46x slower +157.84 ns >> ``` >> On Friday 15 March 2024 at 07:20:11 UTC+1 jan.k...@gmail.com wrote: >> >>> The difference was definitely measurable just in pure running time of >>> the code, setting aside fprof. I'll post what I have after work today. >>> >>> On Thursday, March 14, 2024 at 10:21:25 PM UTC+1 José Valim wrote: >>> >>>> Do you have benchmarks or only the fprof results? fprof is not a >>>> benchmarking tool: comparing fprof results from different code may be >>>> misleading. Proper benchmarking is preferrable. I am benchmarking locally >>>> and I cannot measure any relevant difference even with the whole version >>>> checking removed. >>>> >>>> On Thu, Mar 14, 2024 at 6:01 PM Jan Krüger <jan.k...@gmail.com> wrote: >>>> >>>>> Thanks a lot. I'm also happy to share our case, and my fprof results, >>>>> if that helps. I am very sure that my erlang, and elixir versions match, >>>>> on >>>>> the machine where I've tested this. Replacing Regex.run with an identical >>>>> call to :re.run should show the performance improvement I've mentioned. >>>>> The >>>>> regex we've tested this on is: >>>>> >>>>> ~r/^([a-z][a-z0-9\+\-\.]*):/i >>>>> >>>>> On Thursday, March 14, 2024 at 5:55:47 PM UTC+1 >>>>> marcel...@googlemail.com wrote: >>>>> >>>>>> I'm the maintainer of RDF.ex library with the RDF.IRI module >>>>>> mentioned in the OP. I can confirm that this fix doesn't affect the >>>>>> problem, since we're actually not using `URI.parse/1` most of the time >>>>>> (we >>>>>> use it only when dealing with relative URIs). Even in this case the >>>>>> `Regex.version/0` call in `Regex.safe_run/3` ( >>>>>> https://github.com/elixir-lang/elixir/blob/b8fca42e58850b56f65d0fb8a2086f2636141f61/lib/elixir/lib/regex.ex#L533) >>>>>> >>>>>> still performs the `:erlang.system_info/0` call. >>>>>> >>>>>> On Thursday 14 March 2024 at 17:15:40 UTC+1 jan.k...@gmail.com wrote: >>>>>> >>>>>>> I read the commit, and I don't it fixes what our actual problem was. >>>>>>> See my comment above. The problem is the actual call to :re.version, >>>>>>> not >>>>>>> the recompilation of the regex >>>>>>> >>>>>>> On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote: >>>>>>> >>>>>>>> I have pushed a fix to main. But also note we provide precompiled >>>>>>>> Elixir versions per OTP version. Using a matching version will always >>>>>>>> give >>>>>>>> you the best results and that's not only about regexes. :) >>>>>>>> >>>>>>>> On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I've recently had to work on a code base that parses largish RDF >>>>>>>>> XML files. Part of the code base does relatively simple but regular >>>>>>>>> expression matches, but since the files are large, quite a lot of >>>>>>>>> Regex.run >>>>>>>>> calls. While profiling I've noticed, that there are callouts to >>>>>>>>> :erlang.system_info, which fetches the PCRE version BEAM was compiled >>>>>>>>> against. >>>>>>>>> >>>>>>>>> An example regular expression from the code base in question >>>>>>>>> matches the schema part of a URL. I've replaced Regex.run with >>>>>>>>> erlang's >>>>>>>>> :re.run for testing purposes, and at least for this case, there >>>>>>>>> performance >>>>>>>>> gain is quite dramatic. >>>>>>>>> >>>>>>>>> Comparing fprof results: >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> RDF.IRI.scheme/1 >>>>>>>>> 1176473 30615.618 2354.355 >>>>>>>>> --- >>>>>>>>> RDF.IRI.scheme/1 >>>>>>>>> 1176473 3531.955 2353.905 >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> I found this thread in the google group, which actually talk about >>>>>>>>> the reasoning for fetching the version, and proposes and alternative. >>>>>>>>> >>>>>>>>> >>>>>>>>> https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1 >>>>>>>>> >>>>>>>>> Especially >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> Taking a further look at the code, the issue with recompiling >>>>>>>>> regexes on the fly is that it makes executing the regexes more >>>>>>>>> expensive, >>>>>>>>> as we need to compute the version on every execution. We could store >>>>>>>>> the >>>>>>>>> version in ETS but that would have performance issues. Storing in a >>>>>>>>> persistent_term would be great, but at the moment we support >>>>>>>>> Erlang/OTP >>>>>>>>> 20+. Thoughts? >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> Since this has a fairly noticeable impact, at least on all tests >>>>>>>>> I've run, I wanted to start a discussion, if this could be >>>>>>>>> implemented/improved now. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "elixir-lang-core" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to elixir-lang-co...@googlegroups.com. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elixir-lang-core" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to elixir-lang-co...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elixir-lang-co...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > Kind Regards, > Manish Kr. Sharma > Digital Marketing Manager > > Website: www.brsoftech.com > E-mail: manish...@brsoftech.org > > > > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/061c90a0-6792-46fe-93ed-a56229c8ac91n%40googlegroups.com.