Re: [Numbers] Arrays of "Complex" objects and RAM

Alex Herbert Tue, 05 Nov 2019 09:39:24 -0800

On 05/11/2019 00:09, Eric Barnhill wrote:

That's interesting. The JTransforms library performs Fourier transforms
that can take complex input, output, or both. They do this with interleaved
double[] arrays, which I suppose is much more space efficient, and the
status of a number as real or imaginary is implicit by its location being
odd or even.

The packing of multi-dimensional structures into a single array iscommon in image processing. It has computational efficiency whenoperating on the entire set of numbers.

It is used a lot in image processing to pack 2D data as a single array.In this case even random access to the (x,y) index as:


2D: [x][y]
1D: [y * maxx + x]

is comparable in speed as the later avoids the second array dereference.

A single array has advantages for a copy operation as it requires asingle .clone() method call. The disadvantage is that the ultimate sizeof the multi-dimension array is limited by the size of an array.

In the case of complex numbers the use of alternating indices to storereal and imaginary allows the array to be easily resized and extended.Operations that process all positions only need to cache in memory asub-section of the array.


[re1, im1, re2, im2, re3, im3]

The alternative is to store real and imaginary in a large block ineither 1 array or 2 arrays:


[re1, re2, re3, im1, im2, im3]

[re1, re2, re3]
[im1, im2, im3]

This has advantages for operations that require either the real or theimaginary components on their own. It is a bit more work to extend thedata as it requires two copies of the existing data and packing of thenew values as the appropriate position.

For small sizes the use of 2 arrays this would be less efficient memorywise. It would require some type of pointer to hold the pair. However itcould store double the amount of complex numbers.


The MultidimensionalCounter functionality you mention is for example known
in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of
certain functions which can improve performance. Some people swear by it. I
always found it an additional mental exercise that I didn't want to occupy
myself with unless I absolutely had to. So, maybe it makes sense to
"advertise" this approach like you say, but some users may just want the 3D
arrays for more rapid prototyping-ish applications.

I wonder if there isn't some streaming solution for this -- the numbers are
stored as an interleaved 1D array, but are streamed through a Complex
constructor before any needed operations are performed.

My initial thought is not as double values. A DoubleStream processessingle values. To collect the real and imaginary components wouldrequire that the stream can be processed two at a time.


This structure would work:

class ComplexArray {
    // Must be even in length. Packed as (real, imaginary)
    double[] data;
    int size;

    Stream<Complex> stream() {
        return IntStream.range(0, size / 2)
                        .mapToObj(i -> Complex.ofCartesian(data[i*2], 
data[i*2+1]));
    }

    void add(Complex c) {
        checkCapacity(2);
        data[size++] = c.getReal();
        data[size++] = c.getImaginary();
    }

    void combine(ComplexArray c) {
        checkCapacity(c.size);
        System.arraycopy(c.data, 0, data, size, c.size);
        size += c.size;
    }

    private void checkCapacity(int length) {
        final int minCapacity = size + length;
        final int oldCapacity = data.length;
        if (minCapacity > oldCapacity) {
            int newCapacity = ((oldCapacity * 3) >>> 1) + 1;
            if (newCapacity < minCapacity) {
                newCapacity = minCapacity;
            }
            data = Arrays.copyOf(data, newCapacity);
        }
    }
}

And I guess that leads to my last question -- suppose someone wants to call
a trig function on a series of Complex numbers. Now let's imagine the
primitives have been stored in some maximally efficient way. It seems like,
to use any of the functionality in Complex, these numbers would have to be
unpacked, cast as Complex, operated on, then cast back to how they are
being stored. I wonder if that would prove to be more efficient in the end.

Probably not more efficient to create all the Complex instances. Ideallyyou would operate on the primitive data directly.

Looking at what Complex is capable of doing it seems that there are alot of operations that would have to be duplicated to allow operationsto work on primitives. However you can create the API and implement itlazily using the above object, e.g.


public static ComplexArray apply(ComplexArray array, UnaryOperator<Complex> 
mapper) {
    return array.stream()
                .map(mapper)
                .collect(ComplexArray::new, ComplexArray::add, 
ComplexArray::combine);
}

public static ComplexArray log(ComplexArray array) {
    return apply(Complex::log);
}

This does create a new Complex object for each operation functionapplied to the data.

The other methods that accept another Complex as an argument would haveto be handled differently as you cannot start two streams (for eachComplexArray) in parallel. These are add, divide, multiply, subtract. Sothis would require the computation be moved inside ComplexArray:


ComplexArray apply(ComplexArray other, BiFunction<Complex, Complex, Complex> 
function) {
    // TODO: argument checks on size
    return IntStream.range(0, size / 2)
                    .mapToObj(i -> {
                        Complex c1 = Complex.ofCartesian(data[i*2], 
data[i*2+1]);
                        Complex c2 = Complex.ofCartesian(other.data[i*2], 
other.data[i*2+1]);
                        return function.apply(c1, c2);
                    }).collect(ComplexArray::new, ComplexArray::add, 
ComplexArray::combine);
}

ComplexArray a1 = ...;
ComplexArray a2 = ...;
ComplexArray a3 = a1.apply(a2, Complex::add);

Note: To avoid creating new objects for all computations would require arewrite of Complex so that the computations are done by static methodsthat accept a supplier of real and imaginary components, compute thecalculation and send the answer to a function that receives the newcomponents, for example:


interface ComplexData {
    double getReal();
    double getImaginary();
}

interface ToComplexData {
    ComplexData create(double real, double imaginary);
}

// For example
public static ComplexData conjugate(ComplexData in, ToComplexData out) {
    return out.create(in.getReal(), -in.getImaginary());
}

// This would make the existing API method
public Complex conjugate() {
    return conjugate(this, Complex::ofCartesian);
}

This would allow the ComplexArray to provide an object that implementsComplexData to read the current datapoint and an object that implementsToComplexData to write back to the equivalent datapoint in a resultadding something like this to the ComplexArray object using innerclasses to read and write to the data array:


class ComplexDataReader implements ComplexData {
 int pos;
 double getReal() { return data[pos*2]; }
 double getImaginary() { return data[pos*2+1]; }
}

class ComplexDataAppender implements ToComplexData {
    ComplexData create(double real, double imaginary) {
        data[size++] = real;
        data[size++] = imaginary;
        return null;
    }
}

ComplexDataReader getDataReader() { return new ComplexDataReader(); }
ComplexDataAppender getDataAppender() { return new ComplexDataAppender(); }

ComplexArray apply(BiFunction<ComplexData, Complex, Complex> function) {
   ComplexDataReader in = getDataReader();
   // Initialise the capacity
   ComplexArray result = new ComplexArray(size);
   ComplexDataAppender out = result.getDataAppender();
   while (in.pos < size) {
       function.apply(in, out);
       in.pos += 2;
   }
   return result;
}


// Apply the function from above:
// public static ComplexData conjugate(ComplexData in, ToComplexData out)

ComplexArray a1 = ...;
ComplexArray a2 = a1.apply(Complex::conjugate);

This allows a user to pass a function that can do anything to thecurrent data to create new real and imaginary components which arestored in a new array object. I do not know how this would effectperformance without a full comparison of the API implemented with bothconcepts.

If the storage of the array of complex data were abstracted it may leadto simplification of ComplexUtils. Each of the methods to convert frommulti-dimensional arrays of Complex can be replaced by a single methodto extract the required data from the ComplexArray.

I can see at least the simple abstraction as useful. Performanceimprovements using in-place reading of the real and imaginary componentsfor computations can be added later. It would be easier to do this ifthe ComplexArray is in the same package allowing package private methodsto be added to Complex to do computations.


Alex


On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski <gillese...@gmail.com> wrote:

Hello.

The class "ComplexUtils" deal with multi-dimensional arrays that hold
instances of the "Complex" class.
I've recently encountered a use-case where it was pointed out that storing
many "Complex" instances (as seems the purpose of these utilities) is quite
inefficient memory-wise as each instance will take 32 bytes[1] while the
real and imaginary parts would only take 16 bytes as 2 primitive "double"s.
This is compounded by multi-dimensional array where each sub-dimensional
element is an array object (and thus takes another additional 16 bytes).
For example, a
     double[10][5][4]
would take
     16 * (1 + 10 * 5) + 10 * 5 * 4 * 8
   = 2416 bytes.
Assuming that in the above array, the last dimension holds 2 complex
numbers,
the same data can be represented as
     Complex[10][5][2]
that would take
     16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8
   = 4016 bytes.
In both cases, the payload (10 * 5 * 2 complex numbers) is
     10 * 5 * 2 * 2 * 8
   = 1600 bytes.
If stored in a one-dimensional array, the size in memory would be 1616
bytes.
Thus in the case of a data cube holding 100 complex numbers, a 3D array
takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D
array.
If this is correct, I wonder whether we should advertise such a
"ComplexUtils"
class.  It would perhaps be preferable to propose a data cube
abstraction.[2]

WDYT?

Regards,
Gilles

[1] https://www.baeldung.com/java-size-of-object
[2] Based on
http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Numbers] Arrays of "Complex" objects and RAM

Reply via email to