I quickly checked how a persistent term cached implementation would 
compare, which turned out to perform almost equivalent. It seems 
the :re.version and :erlang.system_info(:endian) values are already cached.

```elixir
defmodule RegexPersistent do
  def version do
    case :persistent_term.get(__MODULE__, nil) do
      nil ->
        version = {:re.version(), :erlang.system_info(:endian)}
        :persistent_term.put(__MODULE__, version)
        version

      version ->
        version
    end
  end

  defp safe_run(
         %Regex{re_pattern: compiled, source: source, re_version: version, 
opts: compile_opts},
         string,
         options
       ) do
    case version() do
      ^version -> :re.run(string, compiled, options)
      _ -> :re.run(string, source, translate_options(compile_opts, options))
    end
  end

  def run(%Regex{} = regex, string, options \\ []) when is_binary(string) do
    return = Keyword.get(options, :return, :binary)
    captures = Keyword.get(options, :capture, :all)
    offset = Keyword.get(options, :offset, 0)

    case safe_run(regex, string, [{:capture, captures, return}, {:offset, 
offset}]) do
      :nomatch -> nil
      :match -> []
      {:match, results} -> results
    end
  end

  defp translate_options(<<?u, t::binary>>, acc), do: translate_options(t, 
[:unicode, :ucp | acc])
  defp translate_options(<<?i, t::binary>>, acc), do: translate_options(t, 
[:caseless | acc])
  defp translate_options(<<?x, t::binary>>, acc), do: translate_options(t, 
[:extended | acc])
  defp translate_options(<<?f, t::binary>>, acc), do: translate_options(t, 
[:firstline | acc])
  defp translate_options(<<?U, t::binary>>, acc), do: translate_options(t, 
[:ungreedy | acc])

  defp translate_options(<<?s, t::binary>>, acc),
    do: translate_options(t, [:dotall, {:newline, :anycrlf} | acc])

  defp translate_options(<<?m, t::binary>>, acc), do: translate_options(t, 
[:multiline | acc])

  defp translate_options(<<?r, t::binary>>, acc) do
    IO.warn("the /r modifier in regular expressions is deprecated, please 
use /U instead")
    translate_options(t, [:ungreedy | acc])
  end

  defp translate_options(<<>>, acc), do: acc
  defp translate_options(rest, _acc), do: {:error, rest}
end

regex = ~r/^([a-z][a-z0-9\+\-\.]*):/i
re_regex = regex.re_pattern

Benchee.run(%{
  "Regex.run/2" => fn -> Regex.run(regex, "foo") end,
  "RegexPersistent.run/2" => fn -> RegexPersistent.run(regex, "foo") end,
  ":re.run/3" => fn -> :re.run("foo", re_regex, [{:capture, :all, 
:binary}]) end
})
```

Results:
```
:re.run/3                    2.72 M      367.06 ns  ±3579.21%         333 
ns         458 ns
Regex.run/2                  1.79 M      557.84 ns  ±5817.83%         417 
ns         542 ns
RegexPersistent.run/2        1.79 M      558.06 ns  ±7018.25%         375 
ns         541 ns

Comparison:
:re.run/3                    2.72 M
Regex.run/2                  1.79 M - 1.52x slower +190.77 ns
RegexPersistent.run/2        1.79 M - 1.52x slower +190.99 ns
```

On Friday 15 March 2024 at 07:58:12 UTC+1 manish...@brsoftech.org wrote:

> How Machine Learning Services Help Business? 
> <https://www.brsoftech.com/machine-learning-solutions.html>
>    
>    - With Machine Learning consulting services businesses can consider 
>    cost reduction while boosting performance.
>    - It helps organizations to timely finish the task with utmost 
>    accuracy.
>    - Retrieve information using cutting edge software tools.
>    - Machine learning works according to recent trends and specifications.
>    - It automates the analysis of past patterns and historical data to 
>    predict the future.
>
>
> On Fri, Mar 15, 2024 at 12:23 PM 'marcel...@googlemail.com' via 
> elixir-lang-core <elixir-l...@googlegroups.com> wrote:
>
>> The benchmark results I'm getting are indeed not as dramatic as the fprof 
>> results, but on the other hand also more than the 5% mentioned in the PR 
>> which introduced the check: 
>> https://github.com/elixir-lang/elixir/pull/9040
>>
>> ```elixir
>> regex = ~r/^([a-z][a-z0-9\+\-\.]*):/i
>> re_pattern = regex.re_pattern
>>
>> Benchee.run(%{
>>   "Regex.run/2" => fn -> Regex.run(regex, "foo") end,
>>   ":re.run/3" => fn -> :re.run("foo", re_pattern, [{:capture, :all, 
>> :binary}]) end
>> })
>> ```
>>
>> ```
>> Name                  ips        average  deviation         median       
>>   99th %
>> :re.run/3          2.88 M      346.90 ns  ±3623.51%         333 ns       
>>   417 ns
>> Regex.run/2        1.98 M      504.74 ns  ±5851.21%         416 ns       
>>   542 ns
>>
>> Comparison:
>> :re.run/3          2.88 M
>> Regex.run/2        1.98 M - 1.46x slower +157.84 ns
>> ```
>> On Friday 15 March 2024 at 07:20:11 UTC+1 jan.k...@gmail.com wrote:
>>
>>> The difference was definitely measurable just in pure running time of 
>>> the code, setting aside fprof. I'll post what I have after work today.
>>>
>>> On Thursday, March 14, 2024 at 10:21:25 PM UTC+1 José Valim wrote:
>>>
>>>> Do you have benchmarks or only the fprof results? fprof is not a 
>>>> benchmarking tool: comparing fprof results from different code may be 
>>>> misleading. Proper benchmarking is preferrable. I am benchmarking locally 
>>>> and I cannot measure any relevant difference even with the whole version 
>>>> checking removed.
>>>>
>>>> On Thu, Mar 14, 2024 at 6:01 PM Jan Krüger <jan.k...@gmail.com> wrote:
>>>>
>>>>> Thanks a lot. I'm also happy to share our case, and my fprof results, 
>>>>> if that helps. I am very sure that my erlang, and elixir versions match, 
>>>>> on 
>>>>> the machine where I've tested this. Replacing Regex.run with an identical 
>>>>> call to :re.run should show the performance improvement I've mentioned. 
>>>>> The 
>>>>> regex we've tested this on is: 
>>>>>
>>>>> ~r/^([a-z][a-z0-9\+\-\.]*):/i
>>>>>
>>>>> On Thursday, March 14, 2024 at 5:55:47 PM UTC+1 
>>>>> marcel...@googlemail.com wrote:
>>>>>
>>>>>> I'm the maintainer of RDF.ex library with the RDF.IRI module 
>>>>>> mentioned in the OP. I can confirm that this fix doesn't affect the 
>>>>>> problem, since we're actually not using `URI.parse/1` most of the time 
>>>>>> (we 
>>>>>> use it only when dealing with relative URIs). Even in this case the 
>>>>>> `Regex.version/0` call in `Regex.safe_run/3` (
>>>>>> https://github.com/elixir-lang/elixir/blob/b8fca42e58850b56f65d0fb8a2086f2636141f61/lib/elixir/lib/regex.ex#L533)
>>>>>>  
>>>>>> still performs the `:erlang.system_info/0` call. 
>>>>>>
>>>>>> On Thursday 14 March 2024 at 17:15:40 UTC+1 jan.k...@gmail.com wrote:
>>>>>>
>>>>>>> I read the commit, and I don't it fixes what our actual problem was. 
>>>>>>> See my comment above. The problem is the actual call to :re.version, 
>>>>>>> not 
>>>>>>> the recompilation of the regex
>>>>>>>
>>>>>>> On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote:
>>>>>>>
>>>>>>>> I have pushed a fix to main. But also note we provide precompiled 
>>>>>>>> Elixir versions per OTP version. Using a matching version will always 
>>>>>>>> give 
>>>>>>>> you the best results and that's not only about regexes. :)
>>>>>>>>
>>>>>>>> On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I've recently had to work on a code base that parses largish RDF 
>>>>>>>>> XML files. Part of the code base does relatively simple but regular 
>>>>>>>>> expression matches, but since the files are large, quite a lot of 
>>>>>>>>> Regex.run 
>>>>>>>>> calls. While profiling I've noticed, that there are callouts to 
>>>>>>>>> :erlang.system_info, which fetches the PCRE version BEAM was compiled 
>>>>>>>>> against.
>>>>>>>>>
>>>>>>>>> An example regular expression from the code base in question 
>>>>>>>>> matches the schema part of a URL. I've replaced Regex.run with 
>>>>>>>>> erlang's 
>>>>>>>>> :re.run for testing purposes, and at least for this case, there 
>>>>>>>>> performance 
>>>>>>>>> gain is quite dramatic.
>>>>>>>>>
>>>>>>>>> Comparing fprof results:
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> RDF.IRI.scheme/1                                               
>>>>>>>>> 1176473   30615.618    2354.355
>>>>>>>>> ---
>>>>>>>>> RDF.IRI.scheme/1                                               
>>>>>>>>> 1176473    3531.955    2353.905
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> I found this thread in the google group, which actually talk about 
>>>>>>>>> the reasoning for fetching the version, and proposes and alternative.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1
>>>>>>>>>
>>>>>>>>> Especially
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> Taking a further look at the code, the issue with recompiling 
>>>>>>>>> regexes on the fly is that it makes executing the regexes more 
>>>>>>>>> expensive, 
>>>>>>>>> as we need to compute the version on every execution. We could store 
>>>>>>>>> the 
>>>>>>>>> version in ETS but that would have performance issues. Storing in a 
>>>>>>>>> persistent_term would be great, but at the moment we support 
>>>>>>>>> Erlang/OTP 
>>>>>>>>> 20+. Thoughts?
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> Since this has a fairly noticeable impact, at least on all tests 
>>>>>>>>> I've run, I wanted to start a discussion, if this could be 
>>>>>>>>> implemented/improved now.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "elixir-lang-core" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to elixir-lang-co...@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elixir-lang-core" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to elixir-lang-co...@googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elixir-lang-co...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
> Kind Regards, 
> Manish Kr. Sharma 
> Digital Marketing Manager
>
> Website: www.brsoftech.com
> E-mail: manish...@brsoftech.org
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/061c90a0-6792-46fe-93ed-a56229c8ac91n%40googlegroups.com.

Reply via email to