The NIFs might be an explanation, for why this shows up as a larger part of 
the execution time, than it actually is. I hadn't considered that.

It probably makes sense for us to keep the :re.run in any event. I think 
the motivation for the thread was also just to give a heads up that there 
might be more of a performance issue here, than you guys assumed when 
introducing this version check. If it turns out to be a mirage, then I 
guess that's just as well :)

On Friday, March 15, 2024 at 8:39:38 AM UTC+1 José Valim wrote:

fprof is great at telling what in a given workflow is taking time but 
comparing fprof results won't tell you by how much it got faster. For that 
you will have to benchmark it again. For tight-loops though, I can see how 
removing the version check, option handling and everything else speeds up 
performance. I think it is fine to go that route if you need to.

I am also not sure if fprof will consider the time spent on NIFs. I assume 
most time is spent on the regex engine but if that is not fully considered 
in fprof, that could affect measurements. But I am speculating here, I 
truly don't know. :)

On Fri, Mar 15, 2024 at 8:31 AM Jan Krüger <jan.k...@gmail.com> wrote:

Alright. If you can't see it, then it must have been something in my 
environment. What I did when working on this is run fprof to identify 
potential performance problems, and the version checked showed up as a 
substantial part of the time spent in the regex code. Is that a valid use 
of fprof in your opinion? Since we're running this in a very tight loop I 
actually also wanted to get rid of the keyword.get calls when running 
regexes, and swapped out Regex.run with :re.run, and that substantially 
improved the performance overall.

I think I didn't then go, and profile specifically if removing the version 
check alone will improve the performance by itself. So all I have to back 
up that the version check is the root cause, is fprof.

On Friday, March 15, 2024 at 8:22:29 AM UTC+1 José Valim wrote:

The 5% also take into account the option processing and result handling. 
The version check itself is a subset of that. I was not able to measure 
sensible gains after removing it.

On Fri, Mar 15, 2024 at 7:58 AM Manish sharma <manish...@brsoftech.org> 
wrote:

How Machine Learning Services Help Business? 
<https://www.brsoftech.com/machine-learning-solutions.html>
   
   - With Machine Learning consulting services businesses can consider cost 
   reduction while boosting performance.
   - It helps organizations to timely finish the task with utmost accuracy.
   - Retrieve information using cutting edge software tools.
   - Machine learning works according to recent trends and specifications.
   - It automates the analysis of past patterns and historical data to 
   predict the future.


On Fri, Mar 15, 2024 at 12:23 PM 'marcel...@googlemail.com' via 
elixir-lang-core <elixir-l...@googlegroups.com> wrote:

The benchmark results I'm getting are indeed not as dramatic as the fprof 
results, but on the other hand also more than the 5% mentioned in the PR 
which introduced the check: https://github.com/elixir-lang/elixir/pull/9040

```elixir
regex = ~r/^([a-z][a-z0-9\+\-\.]*):/i
re_pattern = regex.re_pattern

Benchee.run(%{
  "Regex.run/2" => fn -> Regex.run(regex, "foo") end,
  ":re.run/3" => fn -> :re.run("foo", re_pattern, [{:capture, :all, 
:binary}]) end
})
```

```
Name                  ips        average  deviation         median         
99th %
:re.run/3          2.88 M      346.90 ns  ±3623.51%         333 ns         
417 ns
Regex.run/2        1.98 M      504.74 ns  ±5851.21%         416 ns         
542 ns

Comparison:
:re.run/3          2.88 M
Regex.run/2        1.98 M - 1.46x slower +157.84 ns
```
On Friday 15 March 2024 at 07:20:11 UTC+1 jan.k...@gmail.com wrote:

The difference was definitely measurable just in pure running time of the 
code, setting aside fprof. I'll post what I have after work today.

On Thursday, March 14, 2024 at 10:21:25 PM UTC+1 José Valim wrote:

Do you have benchmarks or only the fprof results? fprof is not a 
benchmarking tool: comparing fprof results from different code may be 
misleading. Proper benchmarking is preferrable. I am benchmarking locally 
and I cannot measure any relevant difference even with the whole version 
checking removed.

On Thu, Mar 14, 2024 at 6:01 PM Jan Krüger <jan.k...@gmail.com> wrote:

Thanks a lot. I'm also happy to share our case, and my fprof results, if 
that helps. I am very sure that my erlang, and elixir versions match, on 
the machine where I've tested this. Replacing Regex.run with an identical 
call to :re.run should show the performance improvement I've mentioned. The 
regex we've tested this on is: 

~r/^([a-z][a-z0-9\+\-\.]*):/i

On Thursday, March 14, 2024 at 5:55:47 PM UTC+1 marcel...@googlemail.com 
wrote:

I'm the maintainer of RDF.ex library with the RDF.IRI module mentioned in 
the OP. I can confirm that this fix doesn't affect the problem, since we're 
actually not using `URI.parse/1` most of the time (we use it only when 
dealing with relative URIs). Even in this case the `Regex.version/0` call 
in `Regex.safe_run/3` (
https://github.com/elixir-lang/elixir/blob/b8fca42e58850b56f65d0fb8a2086f2636141f61/lib/elixir/lib/regex.ex#L533)
 
still performs the `:erlang.system_info/0` call. 

On Thursday 14 March 2024 at 17:15:40 UTC+1 jan.k...@gmail.com wrote:

I read the commit, and I don't it fixes what our actual problem was. See my 
comment above. The problem is the actual call to :re.version, not the 
recompilation of the regex

On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote:

I have pushed a fix to main. But also note we provide precompiled Elixir 
versions per OTP version. Using a matching version will always give you the 
best results and that's not only about regexes. :)

On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> wrote:

I've recently had to work on a code base that parses largish RDF XML files. 
Part of the code base does relatively simple but regular expression 
matches, but since the files are large, quite a lot of Regex.run calls. 
While profiling I've noticed, that there are callouts to 
:erlang.system_info, which fetches the PCRE version BEAM was compiled 
against.

An example regular expression from the code base in question matches the 
schema part of a URL. I've replaced Regex.run with erlang's :re.run for 
testing purposes, and at least for this case, there performance gain is 
quite dramatic.

Comparing fprof results:

```
RDF.IRI.scheme/1                                               1176473   
30615.618    2354.355
---
RDF.IRI.scheme/1                                               1176473   
 3531.955    2353.905
```

I found this thread in the google group, which actually talk about the 
reasoning for fetching the version, and proposes and alternative.

https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1

Especially

```
Taking a further look at the code, the issue with recompiling regexes on 
the fly is that it makes executing the regexes more expensive, as we need 
to compute the version on every execution. We could store the version in 
ETS but that would have performance issues. Storing in a persistent_term 
would be great, but at the moment we support Erlang/OTP 20+. Thoughts?
```

Since this has a fairly noticeable impact, at least on all tests I've run, 
I wanted to start a discussion, if this could be implemented/improved now.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com
 
<https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer>
.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to elixir-lang-co...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com
 
<https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com?utm_medium=email&utm_source=footer>
.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com
 
<https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com?utm_medium=email&utm_source=footer>
.



-- 
Kind Regards, 
Manish Kr. Sharma 
Digital Marketing Manager

Website: www.brsoftech.com
E-mail: manish...@brsoftech.org



-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to elixir-lang-co...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/CABUB1NRDgRTi1woeWX1Shn%3DfuHQMU3cByAUWASXZp4Ye1jif2g%40mail.gmail.com
 
<https://groups.google.com/d/msgid/elixir-lang-core/CABUB1NRDgRTi1woeWX1Shn%3DfuHQMU3cByAUWASXZp4Ye1jif2g%40mail.gmail.com?utm_medium=email&utm_source=footer>
.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to elixir-lang-co...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/ed6c6a7f-74f8-4a49-8c65-42b1ddd8a400n%40googlegroups.com
 
<https://groups.google.com/d/msgid/elixir-lang-core/ed6c6a7f-74f8-4a49-8c65-42b1ddd8a400n%40googlegroups.com?utm_medium=email&utm_source=footer>
.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/57731fcc-ef97-4599-801f-764bc5e57755n%40googlegroups.com.

Reply via email to