I will benchmark but I would be very surprised if :re.version() is the one to blame. It takes 3-4us on my machine.
On Thu, Mar 14, 2024 at 5:15 PM Jan Krüger <jan.krue...@gmail.com> wrote: > I read the commit, and I don't it fixes what our actual problem was. See > my comment above. The problem is the actual call to :re.version, not the > recompilation of the regex > > On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote: > >> I have pushed a fix to main. But also note we provide precompiled Elixir >> versions per OTP version. Using a matching version will always give you the >> best results and that's not only about regexes. :) >> >> On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> wrote: >> >>> I've recently had to work on a code base that parses largish RDF XML >>> files. Part of the code base does relatively simple but regular expression >>> matches, but since the files are large, quite a lot of Regex.run calls. >>> While profiling I've noticed, that there are callouts to >>> :erlang.system_info, which fetches the PCRE version BEAM was compiled >>> against. >>> >>> An example regular expression from the code base in question matches the >>> schema part of a URL. I've replaced Regex.run with erlang's :re.run for >>> testing purposes, and at least for this case, there performance gain is >>> quite dramatic. >>> >>> Comparing fprof results: >>> >>> ``` >>> RDF.IRI.scheme/1 1176473 >>> 30615.618 2354.355 >>> --- >>> RDF.IRI.scheme/1 1176473 >>> 3531.955 2353.905 >>> ``` >>> >>> I found this thread in the google group, which actually talk about the >>> reasoning for fetching the version, and proposes and alternative. >>> >>> >>> https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1 >>> >>> Especially >>> >>> ``` >>> Taking a further look at the code, the issue with recompiling regexes on >>> the fly is that it makes executing the regexes more expensive, as we need >>> to compute the version on every execution. We could store the version in >>> ETS but that would have performance issues. Storing in a persistent_term >>> would be great, but at the moment we support Erlang/OTP 20+. Thoughts? >>> ``` >>> >>> Since this has a fairly noticeable impact, at least on all tests I've >>> run, I wanted to start a discussion, if this could be implemented/improved >>> now. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elixir-lang-core" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elixir-lang-co...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com >>> <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "elixir-lang-core" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elixir-lang-core+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elixir-lang-core/9ba26bb4-fc04-46fb-bf26-ad45bb57cfd6n%40googlegroups.com > <https://groups.google.com/d/msgid/elixir-lang-core/9ba26bb4-fc04-46fb-bf26-ad45bb57cfd6n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2Bz5uY0j4KiuJzgAL%3DfynUR1Nge4QAbNCRRVw76qjmC7w%40mail.gmail.com.