[PHP-DEV] Re: timelib inefficiency

Dmitry Stogov Wed, 07 Apr 2021 04:34:34 -0700

Hi Derick,

I saw you have landed the recent updates of timelib to PHP master.
There are few ongoing improvements:


https://github.com/derickr/timelib/pull/103 - an obvious code motion, that
avoid useless "year" calculation

https://github.com/derickr/timelib/pull/104 - an adoption of my old idea,
that prevents unnecessary 64-bit divisions, to new code-base

https://github.com/derickr/timelib/pull/99 - this is the old and the most
important one. It makes **170 times** speed-up on each very usual
timelib_parse_zone() call. The real problem is in timelib_lookup_abbr().
Actually the algorithm of timelib_lookup_abbr() is highly inefficient and
most probably improper, but this simple check just eliminates the need for
the call to timelib_lookup_abbr() for many usual cases. Just this patch
makes more than 3% improvement on the Symfony Demo app (that is not just a
"hello world").

I hope the landing of these improvements won't take another month.

Thanks. Dmitry,

On Thu, Mar 4, 2021 at 4:13 PM Dmitry Stogov <dmitrysto...@gmail.com> wrote:

> hi,
>
> On Thu, Mar 4, 2021 at 3:30 PM Derick Rethans <der...@php.net> wrote:
>
>> Hi,
>>
>> I saw the PRs coming in, I'll reply inline:
>>
>> On Thu, 4 Mar 2021, Dmitry Stogov wrote:
>>
>> >
>> https://github.com/php/php-src/commit/b4e9b1846376f562e27a13572a137ec584c13f58
>>
>> As Nikita already commented, this now seems to introduce flakeyness into
>> tests.
>>
>> > And created 3 pull request for timelib:
>> >
>> > https://github.com/derickr/timelib/pull/98
>>
>> That looks reasonable. I'm currently working on implementing
>> https://github.com/derickr/timelib/issues/14 which will also touch that
>> code, so I'll look at it as part of that work.
>>
>> > https://github.com/derickr/timelib/pull/99
>>
>> Is incorrect, the tzfile 5 man page says:
>>
>>         Time zone designations should consist of at least three (3) and
>> no more
>>         than six (6) ASCII characters from the set of alphanumerics
>>
>
> timelib (timezonedb.h and fallbackmap.h) doesn't have abbreviation longer
> than 5 characters, but has single char abbreviations.
>
>
>>
>> I am curious to as to why this routine was called so often though, as
>> looking for abbreviations isn't something that should have be be done
>> often.
>>
>
> Every time you parse a time zone name (including a date with timezone) you
> always (except for UTS and GMT) perform a linear search through all entries
> listed in timezonedb.h (1128 string comparisons + 1128*2 lowercasing +
> 1128*2 calls to strlen()), and only then take a look for timezone name,
> that may be already cached.
>
>
>>
>> > https://github.com/derickr/timelib/pull/100
>>
>> This new routine is perhaps faster, but it is also extremely more
>> complex, with little comments. It's unlikely I would want to incorporate
>> it in its current form.
>>
>
> What comments do you like?
> We first guess the year dividing days to 365, then check if our guess is
> right (number of days from the start of the guessed year fits into this
> year), and then continue searching.
> This is like a binary search, but with base 365 and the rule to check the
> leap year.
>
>
>> > Please, verify and merge the timelib patches into PHP.
>> > Please, let me know if this will take time.
>>
>> It will certainly take time :-)
>>
>> > I fixed only the visible, most significant and obvious bottlenecks.
>> > It's possible to improve timelib more...
>>
>> I'm sure it is! The code is 17 years old by now!
>>
>
> yeah, this is the problem. Nobody likes to improve it. I'm interested only
> because it significantly affects PHP performance.
>
> Thanks. Dmitry.
>
>
>>
>> cheers,
>> Derick
>>
>

[PHP-DEV] Re: timelib inefficiency

Reply via email to