The NIFs might be an explanation, for why this shows up as a larger part of the execution time, than it actually is. I hadn't considered that.
It probably makes sense for us to keep the :re.run in any event. I think the motivation for the thread was also just to give a heads up that there might be more of a performance issue here, than you guys assumed when introducing this version check. If it turns out to be a mirage, then I guess that's just as well :) On Friday, March 15, 2024 at 8:39:38 AM UTC+1 José Valim wrote: fprof is great at telling what in a given workflow is taking time but comparing fprof results won't tell you by how much it got faster. For that you will have to benchmark it again. For tight-loops though, I can see how removing the version check, option handling and everything else speeds up performance. I think it is fine to go that route if you need to. I am also not sure if fprof will consider the time spent on NIFs. I assume most time is spent on the regex engine but if that is not fully considered in fprof, that could affect measurements. But I am speculating here, I truly don't know. :) On Fri, Mar 15, 2024 at 8:31 AM Jan Krüger <jan.k...@gmail.com> wrote: Alright. If you can't see it, then it must have been something in my environment. What I did when working on this is run fprof to identify potential performance problems, and the version checked showed up as a substantial part of the time spent in the regex code. Is that a valid use of fprof in your opinion? Since we're running this in a very tight loop I actually also wanted to get rid of the keyword.get calls when running regexes, and swapped out Regex.run with :re.run, and that substantially improved the performance overall. I think I didn't then go, and profile specifically if removing the version check alone will improve the performance by itself. So all I have to back up that the version check is the root cause, is fprof. On Friday, March 15, 2024 at 8:22:29 AM UTC+1 José Valim wrote: The 5% also take into account the option processing and result handling. The version check itself is a subset of that. I was not able to measure sensible gains after removing it. On Fri, Mar 15, 2024 at 7:58 AM Manish sharma <manish...@brsoftech.org> wrote: How Machine Learning Services Help Business? <https://www.brsoftech.com/machine-learning-solutions.html> - With Machine Learning consulting services businesses can consider cost reduction while boosting performance. - It helps organizations to timely finish the task with utmost accuracy. - Retrieve information using cutting edge software tools. - Machine learning works according to recent trends and specifications. - It automates the analysis of past patterns and historical data to predict the future. On Fri, Mar 15, 2024 at 12:23 PM 'marcel...@googlemail.com' via elixir-lang-core <elixir-l...@googlegroups.com> wrote: The benchmark results I'm getting are indeed not as dramatic as the fprof results, but on the other hand also more than the 5% mentioned in the PR which introduced the check: https://github.com/elixir-lang/elixir/pull/9040 ```elixir regex = ~r/^([a-z][a-z0-9\+\-\.]*):/i re_pattern = regex.re_pattern Benchee.run(%{ "Regex.run/2" => fn -> Regex.run(regex, "foo") end, ":re.run/3" => fn -> :re.run("foo", re_pattern, [{:capture, :all, :binary}]) end }) ``` ``` Name ips average deviation median 99th % :re.run/3 2.88 M 346.90 ns ±3623.51% 333 ns 417 ns Regex.run/2 1.98 M 504.74 ns ±5851.21% 416 ns 542 ns Comparison: :re.run/3 2.88 M Regex.run/2 1.98 M - 1.46x slower +157.84 ns ``` On Friday 15 March 2024 at 07:20:11 UTC+1 jan.k...@gmail.com wrote: The difference was definitely measurable just in pure running time of the code, setting aside fprof. I'll post what I have after work today. On Thursday, March 14, 2024 at 10:21:25 PM UTC+1 José Valim wrote: Do you have benchmarks or only the fprof results? fprof is not a benchmarking tool: comparing fprof results from different code may be misleading. Proper benchmarking is preferrable. I am benchmarking locally and I cannot measure any relevant difference even with the whole version checking removed. On Thu, Mar 14, 2024 at 6:01 PM Jan Krüger <jan.k...@gmail.com> wrote: Thanks a lot. I'm also happy to share our case, and my fprof results, if that helps. I am very sure that my erlang, and elixir versions match, on the machine where I've tested this. Replacing Regex.run with an identical call to :re.run should show the performance improvement I've mentioned. The regex we've tested this on is: ~r/^([a-z][a-z0-9\+\-\.]*):/i On Thursday, March 14, 2024 at 5:55:47 PM UTC+1 marcel...@googlemail.com wrote: I'm the maintainer of RDF.ex library with the RDF.IRI module mentioned in the OP. I can confirm that this fix doesn't affect the problem, since we're actually not using `URI.parse/1` most of the time (we use it only when dealing with relative URIs). Even in this case the `Regex.version/0` call in `Regex.safe_run/3` ( https://github.com/elixir-lang/elixir/blob/b8fca42e58850b56f65d0fb8a2086f2636141f61/lib/elixir/lib/regex.ex#L533) still performs the `:erlang.system_info/0` call. On Thursday 14 March 2024 at 17:15:40 UTC+1 jan.k...@gmail.com wrote: I read the commit, and I don't it fixes what our actual problem was. See my comment above. The problem is the actual call to :re.version, not the recompilation of the regex On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote: I have pushed a fix to main. But also note we provide precompiled Elixir versions per OTP version. Using a matching version will always give you the best results and that's not only about regexes. :) On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> wrote: I've recently had to work on a code base that parses largish RDF XML files. Part of the code base does relatively simple but regular expression matches, but since the files are large, quite a lot of Regex.run calls. While profiling I've noticed, that there are callouts to :erlang.system_info, which fetches the PCRE version BEAM was compiled against. An example regular expression from the code base in question matches the schema part of a URL. I've replaced Regex.run with erlang's :re.run for testing purposes, and at least for this case, there performance gain is quite dramatic. Comparing fprof results: ``` RDF.IRI.scheme/1 1176473 30615.618 2354.355 --- RDF.IRI.scheme/1 1176473 3531.955 2353.905 ``` I found this thread in the google group, which actually talk about the reasoning for fetching the version, and proposes and alternative. https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1 Especially ``` Taking a further look at the code, the issue with recompiling regexes on the fly is that it makes executing the regexes more expensive, as we need to compute the version on every execution. We could store the version in ETS but that would have performance issues. Storing in a persistent_term would be great, but at the moment we support Erlang/OTP 20+. Thoughts? ``` Since this has a fairly noticeable impact, at least on all tests I've run, I wanted to start a discussion, if this could be implemented/improved now. -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer> . -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com <https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com?utm_medium=email&utm_source=footer> . -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com <https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com?utm_medium=email&utm_source=footer> . -- Kind Regards, Manish Kr. Sharma Digital Marketing Manager Website: www.brsoftech.com E-mail: manish...@brsoftech.org -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CABUB1NRDgRTi1woeWX1Shn%3DfuHQMU3cByAUWASXZp4Ye1jif2g%40mail.gmail.com <https://groups.google.com/d/msgid/elixir-lang-core/CABUB1NRDgRTi1woeWX1Shn%3DfuHQMU3cByAUWASXZp4Ye1jif2g%40mail.gmail.com?utm_medium=email&utm_source=footer> . -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/ed6c6a7f-74f8-4a49-8c65-42b1ddd8a400n%40googlegroups.com <https://groups.google.com/d/msgid/elixir-lang-core/ed6c6a7f-74f8-4a49-8c65-42b1ddd8a400n%40googlegroups.com?utm_medium=email&utm_source=footer> . -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/57731fcc-ef97-4599-801f-764bc5e57755n%40googlegroups.com.