The difference was definitely measurable just in pure running time of the code, setting aside fprof. I'll post what I have after work today.
On Thursday, March 14, 2024 at 10:21:25 PM UTC+1 José Valim wrote: > Do you have benchmarks or only the fprof results? fprof is not a > benchmarking tool: comparing fprof results from different code may be > misleading. Proper benchmarking is preferrable. I am benchmarking locally > and I cannot measure any relevant difference even with the whole version > checking removed. > > On Thu, Mar 14, 2024 at 6:01 PM Jan Krüger <jan.k...@gmail.com> wrote: > >> Thanks a lot. I'm also happy to share our case, and my fprof results, if >> that helps. I am very sure that my erlang, and elixir versions match, on >> the machine where I've tested this. Replacing Regex.run with an identical >> call to :re.run should show the performance improvement I've mentioned. The >> regex we've tested this on is: >> >> ~r/^([a-z][a-z0-9\+\-\.]*):/i >> >> On Thursday, March 14, 2024 at 5:55:47 PM UTC+1 marcel...@googlemail.com >> wrote: >> >>> I'm the maintainer of RDF.ex library with the RDF.IRI module mentioned >>> in the OP. I can confirm that this fix doesn't affect the problem, since >>> we're actually not using `URI.parse/1` most of the time (we use it only >>> when dealing with relative URIs). Even in this case the `Regex.version/0` >>> call in `Regex.safe_run/3` ( >>> https://github.com/elixir-lang/elixir/blob/b8fca42e58850b56f65d0fb8a2086f2636141f61/lib/elixir/lib/regex.ex#L533) >>> >>> still performs the `:erlang.system_info/0` call. >>> >>> On Thursday 14 March 2024 at 17:15:40 UTC+1 jan.k...@gmail.com wrote: >>> >>>> I read the commit, and I don't it fixes what our actual problem was. >>>> See my comment above. The problem is the actual call to :re.version, not >>>> the recompilation of the regex >>>> >>>> On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote: >>>> >>>>> I have pushed a fix to main. But also note we provide precompiled >>>>> Elixir versions per OTP version. Using a matching version will always >>>>> give >>>>> you the best results and that's not only about regexes. :) >>>>> >>>>> On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> wrote: >>>>> >>>>>> I've recently had to work on a code base that parses largish RDF XML >>>>>> files. Part of the code base does relatively simple but regular >>>>>> expression >>>>>> matches, but since the files are large, quite a lot of Regex.run calls. >>>>>> While profiling I've noticed, that there are callouts to >>>>>> :erlang.system_info, which fetches the PCRE version BEAM was compiled >>>>>> against. >>>>>> >>>>>> An example regular expression from the code base in question matches >>>>>> the schema part of a URL. I've replaced Regex.run with erlang's :re.run >>>>>> for >>>>>> testing purposes, and at least for this case, there performance gain is >>>>>> quite dramatic. >>>>>> >>>>>> Comparing fprof results: >>>>>> >>>>>> ``` >>>>>> RDF.IRI.scheme/1 >>>>>> 1176473 30615.618 2354.355 >>>>>> --- >>>>>> RDF.IRI.scheme/1 >>>>>> 1176473 3531.955 2353.905 >>>>>> ``` >>>>>> >>>>>> I found this thread in the google group, which actually talk about >>>>>> the reasoning for fetching the version, and proposes and alternative. >>>>>> >>>>>> >>>>>> https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1 >>>>>> >>>>>> Especially >>>>>> >>>>>> ``` >>>>>> Taking a further look at the code, the issue with recompiling regexes >>>>>> on the fly is that it makes executing the regexes more expensive, as we >>>>>> need to compute the version on every execution. We could store the >>>>>> version >>>>>> in ETS but that would have performance issues. Storing in a >>>>>> persistent_term >>>>>> would be great, but at the moment we support Erlang/OTP 20+. Thoughts? >>>>>> ``` >>>>>> >>>>>> Since this has a fairly noticeable impact, at least on all tests I've >>>>>> run, I wanted to start a discussion, if this could be >>>>>> implemented/improved >>>>>> now. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elixir-lang-core" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to elixir-lang-co...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >> You received this message because you are subscribed to the Google Groups >> "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elixir-lang-co...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/28b66515-30cf-4f2a-bbc1-25c04fd45ef9n%40googlegroups.com.