Re: [Numbers] Arrays of "Complex" objects and RAM

Alex Herbert Sat, 09 Nov 2019 00:18:19 -0800


> On 9 Nov 2019, at 02:42, Gilles Sadowski <gillese...@gmail.com> wrote:
> 
> Hi.
> 
> Le sam. 9 nov. 2019 à 01:48, Alex Herbert <alex.d.herb...@gmail.com 
> <mailto:alex.d.herb...@gmail.com>> a écrit :
>> 
>> 
>> 
>>> On 7 Nov 2019, at 23:29, Eric Barnhill <ericbarnh...@gmail.com> wrote:
>>> 
>>> On Thu, Nov 7, 2019 at 3:25 PM Gilles Sadowski <gillese...@gmail.com> wrote:
>>> 
>>>> Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill <ericbarnh...@gmail.com> a
>>>> écrit :
>>>>> 
>>>>> I should also add on this note, my use case for developing ComplexUtils
>>>> in
>>>>> the first place was compatibility with JTransforms and JOCL. In both
>>>> cases
>>>>> I wanted to convert Complex[] arrays into interleaved double[] arrays to
>>>>> feed into algorithms using those libraries.
>>>> 
>>>> Implicit in my remark below is the question: Where does the "Complex[]"
>>>> come from?  If it is never a good idea to create such an array, why provide
>>>> code to convert from it?  Do we agree that we should rather create the
>>>> "ComplexList" abstraction, including accessors that shape the data for
>>>> use with e.g. JTransforms?
>>>> 
>>>> 
>>> I completely agree this is a superior solution and look forward to its
>>> development.
>> 
>> I think at least the minimum implementation of the abstraction will be 
>> fairly easy. It can then be performance checked with a few variants.
>> 
>> There currently is not a JMH module for numbers. Should I create one under a 
>> module included with an optional profile?
> 
> What is going to be benchmarked?


The operations in Complex applied via a ComplexList or a Complex[], or a 
specialisation that works direct on the numbers and does not use Complex. If 
one of the purposes of the ComplexList is efficient computation on large 
amounts of data then it would be a start to show the efficiency. I can do this 
on a branch and see if it is useful.

> 
>> 
>> I have spent a fair bit of time trying to fix coverage of Complex. It is now 
>> at 100% with 95% branch coverage. It currently tests all the edge cases for 
>> the C standard. However it doesn’t really test actual computations.
> 
> Actual computations?
> Coveralls seems OK:
>    
> https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
>  
> <https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java>

The point is that you can get 100% coverage by calling all the methods. You 
don’t have to do assertions on the results. So for example sin() has no test on 
data that is not an edge case, for example:

complex(3, 4).sin() = ???

The only test is:

assertComplex(z.sin(), negI.multiply(iz.sinh()));

And the tests for sinh() tests the edge cases.

This is why the following PR [1] exists that swaps the sign for the imaginary 
component in the sin() computation. There are no tests to show the functions 
working on “normal” computations. Essentially the most basic tests that compute 
the functions on data that should go down the common path are all missing, Only 
the computation edge cases are asserted. So I’ll create some using GNU g++ to 
create expected results.

[1] https://github.com/apache/commons-numbers/pull/31/files 
<https://github.com/apache/commons-numbers/pull/31/files>


> 
>> I am going to continue on this to add more tests that the computations 
>> output the same values as a C reference implementation, and hit all the 
>> branches.
>> 
>> When working on the class it raised a few questions:
>> 
>> 1. HashCode
>> 
>> This is current the same for any number with a NaN component. However the C 
>> standard states that NaN components are allowed in infinites. So perhaps the 
>> hashCode should be updated to at least distinguish infinites from NaN.
> 
> What would be the purpose?

For storing in a HashSet. I noted this when first working on the class but on 
reflection I admit that it won’t matter. The current equals method will 
distinguish any binary difference so these are not equal:

Complex(inf, NaN)
Complex(-inf, NaN)
Complex(NaN, inf)
Complex(NaN, -inf)

They are all infinite but are different. They will be stored under the same 
hash along with any other Complex with NaN. It may be of use for searching to 
have any infinite stored under the same key but one that is different from a 
NaN.

> 
>> 
>> 2. NaN definition
>> 
>> What is not clear is if there exists a definition of NaN for a complex 
>> number. The C library has no way to test it with a single function call. You 
>> have to test both the real and imaginary components. On top of this you can 
>> do computation with NaNs. The GNU g++ compiler computes:
>> 
>>        (1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf
>> 
>> Thus this is allowing some computations with NaN to produce results. I don’t 
>> know if this is a bug in GNU g++ or intentional. The C standard states:
>> 
>> “If both operands of the * operator are complex or if the second operand of 
>> the / operator
>> is complex, the operator raises floating-point exceptions if appropriate for 
>> the calculation
>> of the parts of the result, and may raise spurious floating-point 
>> exceptions.”
>> 
>> That is pretty vague. It says that you can do what is “appropriate”.
>> 
>> So it seems that perhaps we should only call a complex a NaN is both fields 
>> are NaN. If only one is NaN then the complex is semi-NaN and results of 
>> computations may or may not make sense.
> 
> Do you have an example of a NaN that makes sense?
> What is semi-NaN?

semiNaN = when only one of the two component is NaN.

The example above is a case which does not make sense to me: (1e300 + i 1e300) 
* (1e30 + i NAN) = inf + I inf

I wrote tests that asserted this should be a NaN (anything multiplied by a 
complex with a NaN component should have a NaN in the results). But my tests 
failed which is why I found this. So is this a bug in the reference 
implementation and in GNU g++ or a deliberate vagueness on working with NaN in 
complex numbers.

I suppose you may do a computation where the Imaginary component ends up as 
NaN. But then you go on to only need the real component. This may not be NaN 
and you continue without knowing the NaN exists in the imaginary.

I don’t like to state that every operation with a NaN in one of the components 
is going to be set to (NaN,NaN). So I will continue to see how GNU g++ works 
and go along with that. It’s been around for a while and so I assume it would 
have been fixed if this is not the way to do it.

> 
>> 
>> 3. Multiplication
>> 
>> I applied a change to the algorithm provided in the C standard when 
>> recovering (NaN,NaN) results of multiplying infinities. If you multiply 
>> infinity by zero this should be NaN.
>> 
>> “if one operand is an infinity and the other operand is a nonzero finite 
>> number or an
>> infinity, then the result of the * operator is an infinity;"
>> 
>> Note the “non-zero” condition.
>> 
>> But the algorithm provided does not do this and you can get an infinite 
>> result back.
>> 
>> This is what GNU g++ does:
>> 
>> multiply(0 + i 0, 0 + i inf) = -nan + i -nan
>> multiply(-0 + i 0, 0 + i inf) = -nan + i -nan
>> multiply(0 + i 0, -0 + i inf) = -nan + i -nan
>> multiply(-0 + i 0, -0 + i inf) = -nan + i -nan
>> multiply(0 + i 0, 1 + i -inf) = -nan + i -nan
>> multiply(-0 + i 0, 1 + i -inf) = -nan + i -nan
>> 
>> So they have implemented the multiply differently because here multiply 
>> infinity by zero is NaN. Note how printf distinguishes the sign of infinite 
>> values. I don’t think Java does this. It collates NaN into one 
>> representation. I will have to convert to raw long bits to get the sign for 
>> a cross reference to a C version.
> 
> I don't follow.  Why would it be useful to distinguish -NaN from NaN or i NaN?

It was an observation. IEEE 754 allows multiple values for a NaN. IIUC Any 
non-zero mantissa with an exponent of 1024 is nan. If the mantissa is zero it 
is infinite. C will distinguish the sign of NaN. Java will not. I suppose the 
authors of Java decided to go away from this as it is not useful.

My motivation was to build a printf equivalent that does output the sign of the 
nan. It would help when comparing the results of the java code to the c code.

I should also try and track down the c code for GNU g++ for multiply and 
divide. Maybe it has been changed from the reference provided in the C99 
document. I’m not familiar with the source for g++ and it might be non trivial 
to do this.

> 
>> 
>> 4. Divide
>> 
>> I have applied a change to this algorithm to check for overflow in the 
>> division. The reference text states:
>> 
>> “if the first operand is a finite number and the second operand is an 
>> infinity, then the
>> result of the / operator is a zero;”
>> 
>> But for example GNU g++ does this:
>> 
>> divide(1e+308 + i 1e+308, inf + i inf) = -nan + i 0
>> 
>> This is what the reference algorithm does if it is not changed. Here 
>> overflow cause problems in the result (which should be zero). So I think my 
>> patch is correct and the result should be (0 + i 0).
> 
> Not sure; I'd think that division by an imaginary infinity should yield NaN.

I read from the standard that divide by infinity is zero.

This 

divide(1e+308 + i 1e+308, inf + i inf) = -nan + I 0

Should be

divide(1e+308 + i 1e+308, inf + i inf) = 0 + i 0

The bug in the reference c code is overflow in the ‘finite / infinite' recovery 
logic that does this:

else if (isinf(logbw) &&
    isfinite(a) && isfinite(b)) {
  c = copysign(isinf(c) ? 1.0 : 0.0, c);
  d = copysign(isinf(d) ? 1.0 : 0.0, d);
  x = 0.0 * ( a * c + b * d );
  y = 0.0 * ( b * c - a * d );
}

Here 

C and D are converted to 1.
Then:
a * c + b * d
1e+308 * 1 + 1e+308 * 1 = Infinite (overflow!)

So:

x = 0 * inf == NaN.

x should be zero.


> 
>> The current unit tests are enforcing this requirement.
>> 
>> 5. Edge cases
>> 
>> Some of the methods have a large amount of if statements to test edge cases. 
>> These are very verbose and hard to follow. They also leave the common path 
>> of a non infinite, non nan, non zero value at the end. So a rearrangement 
>> may allow better branch prediction for the common use case. I’ll carry on 
>> working on this as I add more tests.
>> 
>> 6. tanh
>> 
>> I have put a TODO: note in the code about this. It tests edge case for 
>> infinites. Then doubles the real and imaginary components. These could then 
>> overflow to infinite and so trigger the edge case that has just been 
>> skipped. I am going to compare the result to the GNU C version to decide how 
>> to handle overflow.
> 
> Thanks for the review!
> 
>> 
>> Numbers overall
>> 
>> I’ve now updated the configs for checkstyle, PMD and spotbugs. 
>> Checkstyle:check
> 
> How to disable "failOnViolation" from the command line?

https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html 
<https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html>

<failOnViolation>

User property is: checkstyle.failOnViolation


mvn checkstyle:check -Dcheckstyle.failOnViolation=false

I would just run the report instead:

mvn checkstyle:checkstyle

(I use this in my .bashprofile: alias ch='mvn checkstyle:checkstyle’)

We don’t have fail on violation set for the report so you can always generate 
it. It goes to ’target/site/checkstyle.html'


> 
>> and spotbugs:check are enabled. Pmd:check is not. There are some loose ends 
>> to fix for that. Unfortunately the syntax for exclusions in the PMD config 
>> file are poor. I’ve added some but I’ve not looked at everything yet. A few 
>> decisions might need to be taken on whether to drops rules rather than fix 
>> them to at least get pmd:check active. It spots a lot of things so is useful 
>> for PRs.
> 
> Maybe.  But how does one know what to fix when the build fails because
> of a CheckStyle
> or PMD violation?

Generate and read the reports. You don’t have to build the entire site:

mvn pmd:pmd checkstyle:checkstyle

Currently these are set not to log failures to the console. It could be 
changed. I just used the settings from RNG.

> 
>> I’ve also not looked at SonarCloud [1]. There are a few things on there to 
>> fix too.
> 
> Most (all?) are false positives.

Maybe. If so I can configure them. I asked for Sonar admin access to RNG but I 
seem to have the admin access for Numbers too.

> 
>> [1] https://sonarcloud.io/dashboard?id=commons-numbers 
>> <https://sonarcloud.io/dashboard?id=commons-numbers> 
>> <https://sonarcloud.io/dashboard?id=commons-numbers 
>> <https://sonarcloud.io/dashboard?id=commons-numbers>>
> 
> 
> Best,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org 
> <mailto:dev-unsubscr...@commons.apache.org>
> For additional commands, e-mail: dev-h...@commons.apache.org 
> <mailto:dev-h...@commons.apache.org>

Re: [Numbers] Arrays of "Complex" objects and RAM

Reply via email to