Do you have benchmarks or only the fprof results? fprof is not a
benchmarking tool: comparing fprof results from different code may be
misleading. Proper benchmarking is preferrable. I am benchmarking locally
and I cannot measure any relevant difference even with the whole version
checking removed.

On Thu, Mar 14, 2024 at 6:01 PM Jan Krüger <jan.krue...@gmail.com> wrote:

> Thanks a lot. I'm also happy to share our case, and my fprof results, if
> that helps. I am very sure that my erlang, and elixir versions match, on
> the machine where I've tested this. Replacing Regex.run with an identical
> call to :re.run should show the performance improvement I've mentioned. The
> regex we've tested this on is:
>
> ~r/^([a-z][a-z0-9\+\-\.]*):/i
>
> On Thursday, March 14, 2024 at 5:55:47 PM UTC+1 marcel...@googlemail.com
> wrote:
>
>> I'm the maintainer of RDF.ex library with the RDF.IRI module mentioned in
>> the OP. I can confirm that this fix doesn't affect the problem, since we're
>> actually not using `URI.parse/1` most of the time (we use it only when
>> dealing with relative URIs). Even in this case the `Regex.version/0` call
>> in `Regex.safe_run/3` (
>> https://github.com/elixir-lang/elixir/blob/b8fca42e58850b56f65d0fb8a2086f2636141f61/lib/elixir/lib/regex.ex#L533)
>> still performs the `:erlang.system_info/0` call.
>>
>> On Thursday 14 March 2024 at 17:15:40 UTC+1 jan.k...@gmail.com wrote:
>>
>>> I read the commit, and I don't it fixes what our actual problem was. See
>>> my comment above. The problem is the actual call to :re.version, not the
>>> recompilation of the regex
>>>
>>> On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote:
>>>
>>>> I have pushed a fix to main. But also note we provide precompiled
>>>> Elixir versions per OTP version. Using a matching version will always give
>>>> you the best results and that's not only about regexes. :)
>>>>
>>>> On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> wrote:
>>>>
>>>>> I've recently had to work on a code base that parses largish RDF XML
>>>>> files. Part of the code base does relatively simple but regular expression
>>>>> matches, but since the files are large, quite a lot of Regex.run calls.
>>>>> While profiling I've noticed, that there are callouts to
>>>>> :erlang.system_info, which fetches the PCRE version BEAM was compiled
>>>>> against.
>>>>>
>>>>> An example regular expression from the code base in question matches
>>>>> the schema part of a URL. I've replaced Regex.run with erlang's :re.run 
>>>>> for
>>>>> testing purposes, and at least for this case, there performance gain is
>>>>> quite dramatic.
>>>>>
>>>>> Comparing fprof results:
>>>>>
>>>>> ```
>>>>> RDF.IRI.scheme/1                                               1176473
>>>>>   30615.618    2354.355
>>>>> ---
>>>>> RDF.IRI.scheme/1                                               1176473
>>>>>    3531.955    2353.905
>>>>> ```
>>>>>
>>>>> I found this thread in the google group, which actually talk about the
>>>>> reasoning for fetching the version, and proposes and alternative.
>>>>>
>>>>>
>>>>> https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1
>>>>>
>>>>> Especially
>>>>>
>>>>> ```
>>>>> Taking a further look at the code, the issue with recompiling regexes
>>>>> on the fly is that it makes executing the regexes more expensive, as we
>>>>> need to compute the version on every execution. We could store the version
>>>>> in ETS but that would have performance issues. Storing in a 
>>>>> persistent_term
>>>>> would be great, but at the moment we support Erlang/OTP 20+. Thoughts?
>>>>> ```
>>>>>
>>>>> Since this has a fairly noticeable impact, at least on all tests I've
>>>>> run, I wanted to start a discussion, if this could be implemented/improved
>>>>> now.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elixir-lang-core" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to elixir-lang-co...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
> You received this message because you are subscribed to the Google Groups
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elixir-lang-core+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com
> <https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KCKnDcKKc7uH%2B%2BGB3J9H9%3DdUOZcszdkZvGqJUP6kG2Sg%40mail.gmail.com.

Reply via email to