Re: RFR: 4511638: Double.toString(double) sometimes produces incorrect results [v2]

Raffaello Giulietti Sat, 09 Oct 2021 04:18:39 -0700

On Fri, 16 Apr 2021 11:30:32 GMT, Raffaello Giulietti 
<github.com+70726043+rgiulie...@openjdk.org> wrote:


>> Hello,
>> 
>> here's a PR for a patch submitted on March 2020 
>> [1](https://cr.openjdk.java.net/~bpb/4511638/webrev.04/) when Mercurial was 
>> a thing.
>> 
>> The patch has been edited to adhere to OpenJDK code conventions about 
>> multi-line (block) comments. Nothing in the code proper has changed, except 
>> for the addition of redundant but clarifying parentheses in some expressions.
>> 
>> 
>> Greetings
>> Raffaello
>
> Raffaello Giulietti has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   4511638: Double.toString(double) sometimes produces incorrect results

(comment by Guy Steele)

Hi, Raffaello,

I will try to compose this message in plain ASCII and hope it does not get 
garbled.

I have now re-read the Schubfach paper, and with the extra information in your 
previous email, I think I understood it a lot better this time. I am very 
enthusiastic about the approach. This is exactly what JonL White was trying to 
puzzle out some five decades ago: just how many digits do you need in your 
powers of ten to get accurate print-read round-trips?

I am especially appreciative of the attention paid in the Schubfach paper to 
parameter M, which ensures that two-digit possibilities are considered in cases 
where it is required to print two significant digits anyway (scientific 
notation).

I have also read the two papers I could find about Ryu. Schubfach is clearly an 
improvement over Ryu because it avoids the iteration, among other reasons. But 
the second paper, about Ryu for printf, addresses the specific difficulties of 
generating digits for printf format specifiers, and raises the interesting 
question of whether it would be worthwhile also to design a version of 
Schubfach that handles printf requirements.

So I think it would be well worth incorporating the Schubfach algorithm into 
the JDK codebase. I am reassured that Schubfach has undergone extensive testing 
on trillions of randomly chosen values. But I would be further reassured by a 
statement that two important classes of values have been _exhaustively_ tested:

(1) All positive powers of 2 representable in the double format, plus the 
maximum representable value. (These are the values that are surrounded by 
irregular spacing.) Furthermore, out of caution one should also test every fp 
value within Z ulps of such a value; perhaps Z should be 64 or even 1024.

(2) All representable fp values that are the result of rounding-to-nearest some 
decimal value of the form y•10^n, where y is a member of, say, either D_1, D_2, 
D_3, or D_4. (These are values that may be especially susceptible to generation 
of too many digits by an insufficiently careful algorithm, and one reason might 
be insufficient precision or other algorithmic error in connection with a table 
of powers of ten.) Furthermore, out of caution one should also test every fp 
value within Y ulps of such a value; perhaps Y should be 64 or even 1024.

In every case, the testing should include (a) ensuring round-trip print-read 
behavior; (b) comparing to what is produced by the current JDK algorithm, and 
investigating any cases that differ.

And if other similar “edge cases” can be identified from the structure of the 
code, those would be worth focusing on as well.

If this testing has been or could be done, I would say, yes, certainly adopt 
Schubfach for Java. I would also recommend submitting the Schubfach paper to an 
appropriate conference or journal to get the benefit of further peer review of 
the algorithm and its write-up.

—Guy

----

(my reply)

Hi Guy,

for some reason your comments still appear garbled on the GitHub PR page and 
don't make it to the core-libs-dev mailing list at all. Luckily, they appear 
intelligible in my mailbox, so I'll keep going with prepending your comments in 
my replies: not ideal but good enough.

Thanks so much for re-reading my "paper".


printf()

There are some issues to consider when trying to apply Schubfach to printf(), 
the main one being that printf() allows to specify an arbitrary length for the 
resulting decimal. This means, for example, that unlimited precision arithmetic 
is unavoidable. But it might be worthwhile to investigate using Schubfach for 
lengths up to H (9 and 17 for float and double, resp.) and fall back to 
unlimited precision beyond that.
Before that, however, I would prefer to finally push Schubfach in the OpenJDK 
codebase for the toString() cases and close this PR.


Tests

Below, by "extensive test" I mean not only that the outcomes convert back 
without loss of information, but that they fully meet the spec about minimal 
number of digits, closeness, correct formatting (normal viz scientific), 
character set, etc.

All currently available tests are in the contributed code of this PR and will 
be part of the OpenJDK once integrated.

- All powers of 2 and the extreme values are already extensively tested.
- All values closest to powers of 10 are extensively tested.
- All values proposed by Paxson [1] are extensively tested.
- A configurable number of random values are tested at each round (currently 4 
* 1'000'000 random values). Should a value fail, there's enough diagnostic 
information for further investigation.

I'll add extensive tests for the values you propose in point (1) and (2), 
setting Z = Y = 1024.

As for comparison with the current JDK behavior, there are already a bunch of 
values for which extensive tests fail on the current JDK but pass with 
Schubfach.

It would be cumbersome, if possible at all, to have both the current JDK and 
Schubfach implementations in the same OpenJDK codebase to be able to compare 
the outcomes. I performed comparisons in a different constellation, with 
Schubfach as an external library, but this is hardly a possibility in the 
core-libs. Needless to say, Schubfach outcomes are either the same as in JDK or 
better (shorter or closest to the fp value).


Peer reviewed publication

Shortening my 27 pages writing and re-formating it to meet a journal standards 
for publication would require investing yet another substantial amount of time. 
I'm not sure I'm prepared for that, given that I've no personal interest in a 
journal publication: I'm not an academic, I'm not pursuing a career...
But I promise I'll think about reconsidering my position on this ;-)


Greetings
Raffaello

----

[1] Paxson V, "A Program for Testing IEEE Decimal-Binary Conversion"

-------------

PR: https://git.openjdk.java.net/jdk/pull/3402

Re: RFR: 4511638: Double.toString(double) sometimes produces incorrect results [v2]

Reply via email to