Re: [Numbers] Arrays of "Complex" objects and RAM

Gilles Sadowski Sat, 09 Nov 2019 04:24:41 -0800

Hello Alex.

>>> [...]
>>> I think at least the minimum implementation of the abstraction will be
>>> fairly easy. It can then be performance checked with a few variants.
>>>
>>> There currently is not a JMH module for numbers. Should I create one
>>> under a module included with an optional profile?
>>
>> What is going to be benchmarked?
>
> The operations in Complex applied via a ComplexList or a Complex[], or a
> specialisation that works direct on the numbers and does not use Complex. If
> one of the purposes of the ComplexList is efficient computation on large
> amounts of data then it would be a start to show the efficiency. I can do
> this on a branch and see if it is useful.


No need for a branch, as all expressed opinions agree on the usefulness
of "ComplexList", at least to save RAM.
Indeed, a JMH module would then hopefully show that no performance is
lost by the one additional creation of a "Complex" instance when the data
the abstraction is backed by "double[]" (wrt "Complex[]").

>>>
>>> I have spent a fair bit of time trying to fix coverage of Complex. It is
>>> now at 100% with 95% branch coverage. It currently tests all the edge
>>> cases for the C standard. However it doesn’t really test actual
>>> computations.
>>
>> Actual computations?
>> Coveralls seems OK:
>>
>> https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
>> <https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java>
>
> The point is that you can get 100% coverage by calling all the methods. You
> don’t have to do assertions on the results. So for example sin() has no test
> on data that is not an edge case, for example:
>
> complex(3, 4).sin() = ???
>
> The only test is:
>
> assertComplex(z.sin(), negI.multiply(iz.sinh()));
>
> And the tests for sinh() tests the edge cases.
>
> This is why the following PR [1] exists that swaps the sign for the
> imaginary component in the sin() computation. There are no tests to show the
> functions working on “normal” computations. Essentially the most basic tests
> that compute the functions on data that should go down the common path are
> all missing, Only the computation edge cases are asserted. So I’ll create
> some using GNU g++ to create expected results.
>
> [1] https://github.com/apache/commons-numbers/pull/31/files
> <https://github.com/apache/commons-numbers/pull/31/files>

Got it.  Thanks.

>>
>>> I am going to continue on this to add more tests that the computations
>>> output the same values as a C reference implementation, and hit all the
>>> branches.
>>>
>>> When working on the class it raised a few questions:
>>>
>>> 1. HashCode
>>>
>>> This is current the same for any number with a NaN component. However the
>>> C standard states that NaN components are allowed in infinites. So
>>> perhaps the hashCode should be updated to at least distinguish infinites
>>> from NaN.
>>
>> What would be the purpose?
>
> For storing in a HashSet. I noted this when first working on the class but
> on reflection I admit that it won’t matter. The current equals method will
> distinguish any binary difference so these are not equal:
>
> Complex(inf, NaN)
> Complex(-inf, NaN)
> Complex(NaN, inf)
> Complex(NaN, -inf)
>
> They are all infinite but are different. They will be stored under the same
> hash along with any other Complex with NaN. It may be of use for searching
> to have any infinite stored under the same key but one that is different
> from a NaN.

Finally, I think that the safer option is to follow the JDK convention
and ensure that "hashCode is consistent with equals".

How about using "java.util.Arrays" to define both?

>>>
>>> 2. NaN definition
>>>
>>> What is not clear is if there exists a definition of NaN for a complex
>>> number. The C library has no way to test it with a single function call.
>>> You have to test both the real and imaginary components. On top of this
>>> you can do computation with NaNs. The GNU g++ compiler computes:
>>>
>>>        (1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf
>>>
>>> Thus this is allowing some computations with NaN to produce results. I
>>> don’t know if this is a bug in GNU g++ or intentional. The C standard
>>> states:
>>>
>>> “If both operands of the * operator are complex or if the second operand
>>> of the / operator
>>> is complex, the operator raises floating-point exceptions if appropriate
>>> for the calculation
>>> of the parts of the result, and may raise spurious floating-point
>>> exceptions.”
>>>
>>> That is pretty vague. It says that you can do what is “appropriate”.
>>>
>>> So it seems that perhaps we should only call a complex a NaN is both
>>> fields are NaN. If only one is NaN then the complex is semi-NaN and
>>> results of computations may or may not make sense.
>>
>> Do you have an example of a NaN that makes sense?
>> What is semi-NaN?
>
> semiNaN = when only one of the two component is NaN.
>
> The example above is a case which does not make sense to me: (1e300 + i
> 1e300) * (1e30 + i NAN) = inf + I inf
>
> I wrote tests that asserted this should be a NaN (anything multiplied by a
> complex with a NaN component should have a NaN in the results). But my tests
> failed which is why I found this. So is this a bug in the reference
> implementation and in GNU g++ or a deliberate vagueness on working with NaN
> in complex numbers.
>
> I suppose you may do a computation where the Imaginary component ends up as
> NaN. But then you go on to only need the real component. This may not be NaN
> and you continue without knowing the NaN exists in the imaginary.

Yes.  I seem to remember that some computations should be allowed
to continue even if intermediate results could be NaN...

> I don’t like to state that every operation with a NaN in one of the
> components is going to be set to (NaN,NaN). So I will continue to see how
> GNU g++ works and go along with that. It’s been around for a while and so I
> assume it would have been fixed if this is not the way to do it.

+1

Then, should we decide to align on the g++ results for all computations?
[But not for "equals" if it would contradict the JDK.]

>>>
>>> 3. Multiplication
>>>
>>> I applied a change to the algorithm provided in the C standard when
>>> recovering (NaN,NaN) results of multiplying infinities. If you multiply
>>> infinity by zero this should be NaN.
>>>
>>> “if one operand is an infinity and the other operand is a nonzero finite
>>> number or an
>>> infinity, then the result of the * operator is an infinity;"
>>>
>>> Note the “non-zero” condition.
>>>
>>> But the algorithm provided does not do this and you can get an infinite
>>> result back.
>>>
>>> This is what GNU g++ does:
>>>
>>> multiply(0 + i 0, 0 + i inf) = -nan + i -nan
>>> multiply(-0 + i 0, 0 + i inf) = -nan + i -nan
>>> multiply(0 + i 0, -0 + i inf) = -nan + i -nan
>>> multiply(-0 + i 0, -0 + i inf) = -nan + i -nan
>>> multiply(0 + i 0, 1 + i -inf) = -nan + i -nan
>>> multiply(-0 + i 0, 1 + i -inf) = -nan + i -nan
>>>
>>> So they have implemented the multiply differently because here multiply
>>> infinity by zero is NaN. Note how printf distinguishes the sign of
>>> infinite values. I don’t think Java does this. It collates NaN into one
>>> representation. I will have to convert to raw long bits to get the sign
>>> for a cross reference to a C version.
>>
>> I don't follow.  Why would it be useful to distinguish -NaN from NaN or i
>> NaN?
>
> It was an observation. IEEE 754 allows multiple values for a NaN. IIUC Any
> non-zero mantissa with an exponent of 1024 is nan. If the mantissa is zero
> it is infinite. C will distinguish the sign of NaN. Java will not. I suppose
> the authors of Java decided to go away from this as it is not useful.
>
> My motivation was to build a printf equivalent that does output the sign of
> the nan. It would help when comparing the results of the java code to the c
> code.
>
> I should also try and track down the c code for GNU g++ for multiply and
> divide. Maybe it has been changed from the reference provided in the C99
> document. I’m not familiar with the source for g++ and it might be non
> trivial to do this.

If the issues only arise in edge cases that involve indeterminate forms,
I wouldn't worry too much, apart from stating the potential pitfalls in
the Javadoc.  If something there can be improved (e.g. after looking at
math references), it could be done in a subsequent minor release.

>>
>>>
>>> 4. Divide
>>>
>>> I have applied a change to this algorithm to check for overflow in the
>>> division. The reference text states:
>>>
>>> “if the first operand is a finite number and the second operand is an
>>> infinity, then the
>>> result of the / operator is a zero;”
>>>
>>> But for example GNU g++ does this:
>>>
>>> divide(1e+308 + i 1e+308, inf + i inf) = -nan + i 0
>>>
>>> This is what the reference algorithm does if it is not changed. Here
>>> overflow cause problems in the result (which should be zero). So I think
>>> my patch is correct and the result should be (0 + i 0).
>>
>> Not sure; I'd think that division by an imaginary infinity should yield
>> NaN.
>
> I read from the standard that divide by infinity is zero.
>
> This
>
> divide(1e+308 + i 1e+308, inf + i inf) = -nan + I 0
>
> Should be
>
> divide(1e+308 + i 1e+308, inf + i inf) = 0 + i 0
>
> The bug in the reference c code is overflow in the ‘finite / infinite'
> recovery logic that does this:
>
> else if (isinf(logbw) &&
>     isfinite(a) && isfinite(b)) {
>   c = copysign(isinf(c) ? 1.0 : 0.0, c);
>   d = copysign(isinf(d) ? 1.0 : 0.0, d);
>   x = 0.0 * ( a * c + b * d );
>   y = 0.0 * ( b * c - a * d );
> }
>
> Here
>
> C and D are converted to 1.
> Then:
> a * c + b * d
> 1e+308 * 1 + 1e+308 * 1 = Infinite (overflow!)
>
> So:
>
> x = 0 * inf == NaN.
>
> x should be zero.
>

Perhaps that under some conditions, it would be simpler to
perform the computation in polar form (and hopefully get
the expected result).

>>> [...]
>>>
>>> Numbers overall
>>>
>>> I’ve now updated the configs for checkstyle, PMD and spotbugs.
>>> Checkstyle:check
>>
>> How to disable "failOnViolation" from the command line?
>
> https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html
> <https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html>
>
> <failOnViolation>
>
> User property is: checkstyle.failOnViolation
>
>
> mvn checkstyle:check -Dcheckstyle.failOnViolation=false

I think that I tried
  $ mvn -Dcheckstyle.failOnViolation=false test

And still it wouldn't run the test (because I had introduced the
"System.out" forbidden construct).

> I would just run the report instead:
>
> mvn checkstyle:checkstyle
>
> (I use this in my .bashprofile: alias ch='mvn checkstyle:checkstyle’)
>
> We don’t have fail on violation set for the report so you can always
> generate it. It goes to ’target/site/checkstyle.html'
>>
>>> and spotbugs:check are enabled. Pmd:check is not. There are some loose
>>> ends to fix for that. Unfortunately the syntax for exclusions in the PMD
>>> config file are poor. I’ve added some but I’ve not looked at everything
>>> yet. A few decisions might need to be taken on whether to drops rules
>>> rather than fix them to at least get pmd:check active. It spots a lot of
>>> things so is useful for PRs.
>>
>> Maybe.  But how does one know what to fix when the build fails because
>> of a CheckStyle
>> or PMD violation?
>
> Generate and read the reports.

Well, of course but with yesterday's config, the build would just
fail, without having a report to read.

> You don’t have to build the entire site:

When the focus is on the report, I'd find it easier to generate all of
them and read them on the web site...

When the focus is on debugging/coverage, I prefer that checks are
not part of the build.
[But I agree that it's better to intercept problematic PRs...]

>
> mvn pmd:pmd checkstyle:checkstyle
>
> Currently these are set not to log failures to the console. It could be
> changed. I just used the settings from RNG.

Sure.
But, there, we are already much more confident in the overall
shape of the test suite.
Anyway, thanks to your involvement, we are on that path for
[Numbers] too. :-)

>>
>>> I’ve also not looked at SonarCloud [1]. There are a few things on there
>>> to fix too.
>>
>> Most (all?) are false positives.
>
> Maybe.

The likes of "method too long", "commented out code to be removed",
"complexity too high", ...

Regards,
Gilles

> If so I can configure them. I asked for Sonar admin access to RNG but
> I seem to have the admin access for Numbers too.
>
>>
>>> [1] https://sonarcloud.io/dashboard?id=commons-numbers
>>> <https://sonarcloud.io/dashboard?id=commons-numbers>
>>> <https://sonarcloud.io/dashboard?id=commons-numbers
>>> <https://sonarcloud.io/dashboard?id=commons-numbers>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Numbers] Arrays of "Complex" objects and RAM

Reply via email to