Alright. If you can't see it, then it must have been something in my environment. What I did when working on this is run fprof to identify potential performance problems, and the version checked showed up as a substantial part of the time spent in the regex code. Is that a valid use of fprof in your opinion? Since we're running this in a very tight loop I actually also wanted to get rid of the keyword.get calls when running regexes, and swapped out Regex.run with :re.run, and that substantially improved the performance overall.
I think I didn't then go, and profile specifically if removing the version check alone will improve the performance by itself. So all I have to back up that the version check is the root cause, is fprof. On Friday, March 15, 2024 at 8:22:29 AM UTC+1 José Valim wrote: > The 5% also take into account the option processing and result handling. > The version check itself is a subset of that. I was not able to measure > sensible gains after removing it. > > On Fri, Mar 15, 2024 at 7:58 AM Manish sharma <manish...@brsoftech.org> > wrote: > >> How Machine Learning Services Help Business? >> <https://www.brsoftech.com/machine-learning-solutions.html> >> >> - With Machine Learning consulting services businesses can consider >> cost reduction while boosting performance. >> - It helps organizations to timely finish the task with utmost >> accuracy. >> - Retrieve information using cutting edge software tools. >> - Machine learning works according to recent trends and >> specifications. >> - It automates the analysis of past patterns and historical data to >> predict the future. >> >> >> On Fri, Mar 15, 2024 at 12:23 PM 'marcel...@googlemail.com' via >> elixir-lang-core <elixir-l...@googlegroups.com> wrote: >> >>> The benchmark results I'm getting are indeed not as dramatic as the >>> fprof results, but on the other hand also more than the 5% mentioned in the >>> PR which introduced the check: >>> https://github.com/elixir-lang/elixir/pull/9040 >>> >>> ```elixir >>> regex = ~r/^([a-z][a-z0-9\+\-\.]*):/i >>> re_pattern = regex.re_pattern >>> >>> Benchee.run(%{ >>> "Regex.run/2" => fn -> Regex.run(regex, "foo") end, >>> ":re.run/3" => fn -> :re.run("foo", re_pattern, [{:capture, :all, >>> :binary}]) end >>> }) >>> ``` >>> >>> ``` >>> Name ips average deviation median >>> 99th % >>> :re.run/3 2.88 M 346.90 ns ±3623.51% 333 ns >>> 417 ns >>> Regex.run/2 1.98 M 504.74 ns ±5851.21% 416 ns >>> 542 ns >>> >>> Comparison: >>> :re.run/3 2.88 M >>> Regex.run/2 1.98 M - 1.46x slower +157.84 ns >>> ``` >>> On Friday 15 March 2024 at 07:20:11 UTC+1 jan.k...@gmail.com wrote: >>> >>>> The difference was definitely measurable just in pure running time of >>>> the code, setting aside fprof. I'll post what I have after work today. >>>> >>>> On Thursday, March 14, 2024 at 10:21:25 PM UTC+1 José Valim wrote: >>>> >>>>> Do you have benchmarks or only the fprof results? fprof is not a >>>>> benchmarking tool: comparing fprof results from different code may be >>>>> misleading. Proper benchmarking is preferrable. I am benchmarking locally >>>>> and I cannot measure any relevant difference even with the whole version >>>>> checking removed. >>>>> >>>>> On Thu, Mar 14, 2024 at 6:01 PM Jan Krüger <jan.k...@gmail.com> wrote: >>>>> >>>>>> Thanks a lot. I'm also happy to share our case, and my fprof results, >>>>>> if that helps. I am very sure that my erlang, and elixir versions match, >>>>>> on >>>>>> the machine where I've tested this. Replacing Regex.run with an >>>>>> identical >>>>>> call to :re.run should show the performance improvement I've mentioned. >>>>>> The >>>>>> regex we've tested this on is: >>>>>> >>>>>> ~r/^([a-z][a-z0-9\+\-\.]*):/i >>>>>> >>>>>> On Thursday, March 14, 2024 at 5:55:47 PM UTC+1 >>>>>> marcel...@googlemail.com wrote: >>>>>> >>>>>>> I'm the maintainer of RDF.ex library with the RDF.IRI module >>>>>>> mentioned in the OP. I can confirm that this fix doesn't affect the >>>>>>> problem, since we're actually not using `URI.parse/1` most of the time >>>>>>> (we >>>>>>> use it only when dealing with relative URIs). Even in this case the >>>>>>> `Regex.version/0` call in `Regex.safe_run/3` ( >>>>>>> https://github.com/elixir-lang/elixir/blob/b8fca42e58850b56f65d0fb8a2086f2636141f61/lib/elixir/lib/regex.ex#L533) >>>>>>> >>>>>>> still performs the `:erlang.system_info/0` call. >>>>>>> >>>>>>> On Thursday 14 March 2024 at 17:15:40 UTC+1 jan.k...@gmail.com >>>>>>> wrote: >>>>>>> >>>>>>>> I read the commit, and I don't it fixes what our actual problem >>>>>>>> was. See my comment above. The problem is the actual call to >>>>>>>> :re.version, >>>>>>>> not the recompilation of the regex >>>>>>>> >>>>>>>> On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote: >>>>>>>> >>>>>>>>> I have pushed a fix to main. But also note we provide precompiled >>>>>>>>> Elixir versions per OTP version. Using a matching version will always >>>>>>>>> give >>>>>>>>> you the best results and that's not only about regexes. :) >>>>>>>>> >>>>>>>>> On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I've recently had to work on a code base that parses largish RDF >>>>>>>>>> XML files. Part of the code base does relatively simple but regular >>>>>>>>>> expression matches, but since the files are large, quite a lot of >>>>>>>>>> Regex.run >>>>>>>>>> calls. While profiling I've noticed, that there are callouts to >>>>>>>>>> :erlang.system_info, which fetches the PCRE version BEAM was >>>>>>>>>> compiled >>>>>>>>>> against. >>>>>>>>>> >>>>>>>>>> An example regular expression from the code base in question >>>>>>>>>> matches the schema part of a URL. I've replaced Regex.run with >>>>>>>>>> erlang's >>>>>>>>>> :re.run for testing purposes, and at least for this case, there >>>>>>>>>> performance >>>>>>>>>> gain is quite dramatic. >>>>>>>>>> >>>>>>>>>> Comparing fprof results: >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> RDF.IRI.scheme/1 >>>>>>>>>> 1176473 30615.618 2354.355 >>>>>>>>>> --- >>>>>>>>>> RDF.IRI.scheme/1 >>>>>>>>>> 1176473 3531.955 2353.905 >>>>>>>>>> ``` >>>>>>>>>> >>>>>>>>>> I found this thread in the google group, which actually talk >>>>>>>>>> about the reasoning for fetching the version, and proposes and >>>>>>>>>> alternative. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1 >>>>>>>>>> >>>>>>>>>> Especially >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> Taking a further look at the code, the issue with recompiling >>>>>>>>>> regexes on the fly is that it makes executing the regexes more >>>>>>>>>> expensive, >>>>>>>>>> as we need to compute the version on every execution. We could store >>>>>>>>>> the >>>>>>>>>> version in ETS but that would have performance issues. Storing in a >>>>>>>>>> persistent_term would be great, but at the moment we support >>>>>>>>>> Erlang/OTP >>>>>>>>>> 20+. Thoughts? >>>>>>>>>> ``` >>>>>>>>>> >>>>>>>>>> Since this has a fairly noticeable impact, at least on all tests >>>>>>>>>> I've run, I wanted to start a discussion, if this could be >>>>>>>>>> implemented/improved now. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "elixir-lang-core" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to elixir-lang-co...@googlegroups.com. >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com >>>>>>>>>> >>>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elixir-lang-core" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to elixir-lang-co...@googlegroups.com. >>>>>> >>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elixir-lang-core" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elixir-lang-co...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> Kind Regards, >> Manish Kr. Sharma >> Digital Marketing Manager >> >> Website: www.brsoftech.com >> E-mail: manish...@brsoftech.org >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elixir-lang-co...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/CABUB1NRDgRTi1woeWX1Shn%3DfuHQMU3cByAUWASXZp4Ye1jif2g%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/elixir-lang-core/CABUB1NRDgRTi1woeWX1Shn%3DfuHQMU3cByAUWASXZp4Ye1jif2g%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/ed6c6a7f-74f8-4a49-8c65-42b1ddd8a400n%40googlegroups.com.