[Libreoffice-bugs] [Bug 136615] Re-consider date/time parts calculation for functions and formatting

bugzilla-daemon Sat, 30 Sep 2023 05:13:11 -0700

https://bugs.documentfoundation.org/show_bug.cgi?id=136615


--- Comment #48 from Albrecht Müller <[email protected]> ---
(In reply to Regina Henschel from comment #40)
> ... 
> How would you then solve the request to have functions, that behave like
> Excel?

I fear that the answer to your question is: 

It is not possible to fix this problem just by adding a new parameter to some
functions.

It is in the nature of the problem that you cannot fix it without changing the
definition of some functions in a way that they have to return values that are
different of those required by the current specification.

As I see it a solution requires to rewrite substantial parts of the
specification that deal with date and time calculations. I don’t know if this
is feasible. Here are some of my ideas:

The use of algorithms or datatypes for specification purposes should be avoided
whenever possible. The reason is that algorithms and datatypes contain the
wrong information. They describe one way how the output values can be
calculated from the input values. This is an information that is important for
implementations but not for the specification. The specification should state
how the input values are related to the output values, what properties this
relation has, design decisions etc.

The problems with the SECOND function illustrate what can go wrong when
algorithms are used for specification purposes. Even if you fully understand
how the algorithm works you don’t know for what purpose it rounds its input
argument or why it should truncate it according Eike’s proposal. Algorithms may
also specify unintended properties such as the return value of 60 in the case
of the SECOND function, or the pseudo random results of the MINUTE function.

I think there is a need for a section that explains the general principles that
are valid for (almost) _all_ date and time calculations within this
specification including for example financial functions. The specifications of
individual functions should refer to this section. I hope that this way all
date and time related functions show a consistent behaviour and that the
specifications get simpler. The specification of the SECOND function could
simply state that this function returns the number of a second within a minute
as defined is this section.

Here are some ideas about what should go into this section:

(1) Date and time values are represented by IEE 754 binary64 (i.e. 64 bit
double) values.

(2) The number 1 represents one day. A day has 24 hours of equal length, an
hour has 60 minutes and a minute 60 seconds. The units within a day are counted
from 0 to 23 or from 0 to 59 respectively.

(3) Dates are represented by counting the days between some fixed start date
(usually 1899-12-30, for compatibility reasons other dates may be used) and the
represented date. Positive numbers represent dates after the start date. The
Gregorian calender is used as in ISO 8601.

(4) Unless there are sound and documented reasons to do otherwise any
calculation that deals with time quantities such as seconds, minutes, hours,
days, weeks, month, years etc. uses an exact arithmetic that is based on
integer multiples of seconds.

(5) Due to (1) and (4) there is a need to convert between the floating point
representation and exact representations. This conversion is done by either
rounding floating point values to the nearest exact second or by choosing the
floating point value that has the least distance to the value of the exact
second. Therefore date and time related functions round their arguments to the
nearest second unless there are sound and documented reasons to do otherwise.

(6) A suitable range of date and time calculations should be specified. Maybe
1.1.1583 to 31.12.9999 would be ok.


The section should also give the reasons and limitations that are connected
with these principles.

ad (1) Reason: You may get quite different results if you use a floating-point
arithmetic of different precision. In order to get consistent results between
different implementations it should be specified which arithmetic to use. 
Limitation: This excludes the full support of durations in the sense of ISO
8601 (https://en.wikipedia.org/wiki/ISO_8601#Durations) as you cannot represent
time quantities of undefined length (e.g. month) using floating-point numbers
alone.

ad (2) Reason: This corresponds to a common perception and allows easy
calculations. 
Limitations: This design decision excludes date and time system that that have
days, hours, minutes or seconds of varying length. Examples are systems that
support switching between daylight saving time and standard time, contain leap
seconds, or start days at sunset etc.

ad (4) Reason: As Eike pointed out many times floating point numbers are not
suited for calculations with date and time. One of the reasons is that there is
an inherent semantic ambiguity due to round off errors. Assume there is
floating point value corresponding to 23:59:59.999. Does this value belong to
the current day or does this represent 00:00:00.000 of the next day? This value
could be the measurement of a clock with a millisecond resolution. Thus is
value belongs to the current day. This value could also be the result of some
calculation that due to round off errors is a millisecond less than the correct
result of 00:00:00.000 of the next day. Therefore it belongs to the next day.
For similar reasons value 00:00:00.000 may belong to the current or the
previous day. I see no way how some algorithm could distinguish between these
situations. That’s why an arithmetic should be used that is not disturbed by
round off errors. This kind of arithmetic allows to truncate e.g. hours and
minutes safely, i.e. without showing the pseudo random behaviour that results
from applying truncation on raw floating point values.

ad (5) Reason: Provided that the round off errors stay somewhat below ½ second
this method guarantees that round trip conversion from exact to float to exact
yields the original exact value.
There may be certain functions that must work differently. If formatting some
floating point value to a format that contains fractional seconds is required
it makes no sense to round the input to the nearest second. In these cases the
specification should give the reasons why the principles are violated and
explain the consequences. In the formatting example one effect is that the year
contained in the result of the formatting may differ from the result of the
YEAR function. This corresponds to an effect seen in ordinary floating point
representations. The most significant digit may depend on the rounding applied
to the least significant digit: 9.99 becomes 10.0 if only one digit is used
after the decimal separator.
Limitations: Rounding to the nearest second means that a new day will start at
23:59:59.5 which is somewhat counter-intuitive.

ad (6) Reason: Suitable limits are necessary as the time resolution of floating
point values may vary by order of magnitudes. This depends e.g. from the
distance of some date and time value to the start date according to (3). Some
assumptions used in the design of the date and time calculations may not hold
when the values become too large, e.g. you cannot round to the nearest second
because the next floating point value is more than ½ second away. The lower
limit is the first full year after the Gregorian reform to avoid complications
caused by the transition between the Julian and Gregorian calendar. The upper
limit is chosen as ISO 8601 uses 4 digit years. I think (but did not check
this) that the precision according of the floating point arithmetic is enough
to support this range.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 136615] Re-consider date/time parts calculation for functions and formatting

Reply via email to