Hi Derick, I saw you have landed the recent updates of timelib to PHP master. There are few ongoing improvements:
https://github.com/derickr/timelib/pull/103 - an obvious code motion, that avoid useless "year" calculation https://github.com/derickr/timelib/pull/104 - an adoption of my old idea, that prevents unnecessary 64-bit divisions, to new code-base https://github.com/derickr/timelib/pull/99 - this is the old and the most important one. It makes **170 times** speed-up on each very usual timelib_parse_zone() call. The real problem is in timelib_lookup_abbr(). Actually the algorithm of timelib_lookup_abbr() is highly inefficient and most probably improper, but this simple check just eliminates the need for the call to timelib_lookup_abbr() for many usual cases. Just this patch makes more than 3% improvement on the Symfony Demo app (that is not just a "hello world"). I hope the landing of these improvements won't take another month. Thanks. Dmitry, On Thu, Mar 4, 2021 at 4:13 PM Dmitry Stogov <dmitrysto...@gmail.com> wrote: > hi, > > On Thu, Mar 4, 2021 at 3:30 PM Derick Rethans <der...@php.net> wrote: > >> Hi, >> >> I saw the PRs coming in, I'll reply inline: >> >> On Thu, 4 Mar 2021, Dmitry Stogov wrote: >> >> > >> https://github.com/php/php-src/commit/b4e9b1846376f562e27a13572a137ec584c13f58 >> >> As Nikita already commented, this now seems to introduce flakeyness into >> tests. >> >> > And created 3 pull request for timelib: >> > >> > https://github.com/derickr/timelib/pull/98 >> >> That looks reasonable. I'm currently working on implementing >> https://github.com/derickr/timelib/issues/14 which will also touch that >> code, so I'll look at it as part of that work. >> >> > https://github.com/derickr/timelib/pull/99 >> >> Is incorrect, the tzfile 5 man page says: >> >> Time zone designations should consist of at least three (3) and >> no more >> than six (6) ASCII characters from the set of alphanumerics >> > > timelib (timezonedb.h and fallbackmap.h) doesn't have abbreviation longer > than 5 characters, but has single char abbreviations. > > >> >> I am curious to as to why this routine was called so often though, as >> looking for abbreviations isn't something that should have be be done >> often. >> > > Every time you parse a time zone name (including a date with timezone) you > always (except for UTS and GMT) perform a linear search through all entries > listed in timezonedb.h (1128 string comparisons + 1128*2 lowercasing + > 1128*2 calls to strlen()), and only then take a look for timezone name, > that may be already cached. > > >> >> > https://github.com/derickr/timelib/pull/100 >> >> This new routine is perhaps faster, but it is also extremely more >> complex, with little comments. It's unlikely I would want to incorporate >> it in its current form. >> > > What comments do you like? > We first guess the year dividing days to 365, then check if our guess is > right (number of days from the start of the guessed year fits into this > year), and then continue searching. > This is like a binary search, but with base 365 and the rule to check the > leap year. > > >> > Please, verify and merge the timelib patches into PHP. >> > Please, let me know if this will take time. >> >> It will certainly take time :-) >> >> > I fixed only the visible, most significant and obvious bottlenecks. >> > It's possible to improve timelib more... >> >> I'm sure it is! The code is 17 years old by now! >> > > yeah, this is the problem. Nobody likes to improve it. I'm interested only > because it significantly affects PHP performance. > > Thanks. Dmitry. > > >> >> cheers, >> Derick >> >