On 05/11/2019 00:09, Eric Barnhill wrote:
That's interesting. The JTransforms library performs Fourier transforms
that can take complex input, output, or both. They do this with interleaved
double[] arrays, which I suppose is much more space efficient, and the
status of a number as real or imaginary is implicit by its location being
odd or even.

The packing of multi-dimensional structures into a single array is common in image processing. It has computational efficiency when operating on the entire set of numbers.

It is used a lot in image processing to pack 2D data as a single array. In this case even random access to the (x,y) index as:

2D: [x][y]
1D: [y * maxx + x]

is comparable in speed as the later avoids the second array dereference.

A single array has advantages for a copy operation as it requires a single .clone() method call. The disadvantage is that the ultimate size of the multi-dimension array is limited by the size of an array.

In the case of complex numbers the use of alternating indices to store real and imaginary allows the array to be easily resized and extended. Operations that process all positions only need to cache in memory a sub-section of the array.

[re1, im1, re2, im2, re3, im3]

The alternative is to store real and imaginary in a large block in either 1 array or 2 arrays:

[re1, re2, re3, im1, im2, im3]

[re1, re2, re3]
[im1, im2, im3]

This has advantages for operations that require either the real or the imaginary components on their own. It is a bit more work to extend the data as it requires two copies of the existing data and packing of the new values as the appropriate position.

For small sizes the use of 2 arrays this would be less efficient memory wise. It would require some type of pointer to hold the pair. However it could store double the amount of complex numbers.

The MultidimensionalCounter functionality you mention is for example known
in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of
certain functions which can improve performance. Some people swear by it. I
always found it an additional mental exercise that I didn't want to occupy
myself with unless I absolutely had to. So, maybe it makes sense to
"advertise" this approach like you say, but some users may just want the 3D
arrays for more rapid prototyping-ish applications.

I wonder if there isn't some streaming solution for this -- the numbers are
stored as an interleaved 1D array, but are streamed through a Complex
constructor before any needed operations are performed.

My initial thought is not as double values. A DoubleStream processes single values. To collect the real and imaginary components would require that the stream can be processed two at a time.

This structure would work:

class ComplexArray {
    // Must be even in length. Packed as (real, imaginary)
    double[] data;
    int size;

    Stream<Complex> stream() {
        return IntStream.range(0, size / 2)
                        .mapToObj(i -> Complex.ofCartesian(data[i*2], 
data[i*2+1]));
    }

    void add(Complex c) {
        checkCapacity(2);
        data[size++] = c.getReal();
        data[size++] = c.getImaginary();
    }

    void combine(ComplexArray c) {
        checkCapacity(c.size);
        System.arraycopy(c.data, 0, data, size, c.size);
        size += c.size;
    }

    private void checkCapacity(int length) {
        final int minCapacity = size + length;
        final int oldCapacity = data.length;
        if (minCapacity > oldCapacity) {
            int newCapacity = ((oldCapacity * 3) >>> 1) + 1;
            if (newCapacity < minCapacity) {
                newCapacity = minCapacity;
            }
            data = Arrays.copyOf(data, newCapacity);
        }
    }
}


And I guess that leads to my last question -- suppose someone wants to call
a trig function on a series of Complex numbers. Now let's imagine the
primitives have been stored in some maximally efficient way. It seems like,
to use any of the functionality in Complex, these numbers would have to be
unpacked, cast as Complex, operated on, then cast back to how they are
being stored. I wonder if that would prove to be more efficient in the end.

Probably not more efficient to create all the Complex instances. Ideally you would operate on the primitive data directly.

Looking at what Complex is capable of doing it seems that there are a lot of operations that would have to be duplicated to allow operations to work on primitives. However you can create the API and implement it lazily using the above object, e.g.

public static ComplexArray apply(ComplexArray array, UnaryOperator<Complex> 
mapper) {
    return array.stream()
                .map(mapper)
                .collect(ComplexArray::new, ComplexArray::add, 
ComplexArray::combine);
}

public static ComplexArray log(ComplexArray array) {
    return apply(Complex::log);
}

This does create a new Complex object for each operation function applied to the data.

The other methods that accept another Complex as an argument would have to be handled differently as you cannot start two streams (for each ComplexArray) in parallel. These are add, divide, multiply, subtract. So this would require the computation be moved inside ComplexArray:

ComplexArray apply(ComplexArray other, BiFunction<Complex, Complex, Complex> 
function) {
    // TODO: argument checks on size
    return IntStream.range(0, size / 2)
                    .mapToObj(i -> {
                        Complex c1 = Complex.ofCartesian(data[i*2], 
data[i*2+1]);
                        Complex c2 = Complex.ofCartesian(other.data[i*2], 
other.data[i*2+1]);
                        return function.apply(c1, c2);
                    }).collect(ComplexArray::new, ComplexArray::add, 
ComplexArray::combine);
}

ComplexArray a1 = ...;
ComplexArray a2 = ...;
ComplexArray a3 = a1.apply(a2, Complex::add);
Note: To avoid creating new objects for all computations would require a rewrite of Complex so that the computations are done by static methods that accept a supplier of real and imaginary components, compute the calculation and send the answer to a function that receives the new components, for example:

interface ComplexData {
    double getReal();
    double getImaginary();
}

interface ToComplexData {
    ComplexData create(double real, double imaginary);
}

// For example
public static ComplexData conjugate(ComplexData in, ToComplexData out) {
    return out.create(in.getReal(), -in.getImaginary());
}

// This would make the existing API method
public Complex conjugate() {
    return conjugate(this, Complex::ofCartesian);
}

This would allow the ComplexArray to provide an object that implements ComplexData to read the current datapoint and an object that implements ToComplexData to write back to the equivalent datapoint in a result adding something like this to the ComplexArray object using inner classes to read and write to the data array:

class ComplexDataReader implements ComplexData {
 int pos;
 double getReal() { return data[pos*2]; }
 double getImaginary() { return data[pos*2+1]; }
}

class ComplexDataAppender implements ToComplexData {
    ComplexData create(double real, double imaginary) {
        data[size++] = real;
        data[size++] = imaginary;
        return null;
    }
}

ComplexDataReader getDataReader() { return new ComplexDataReader(); }
ComplexDataAppender getDataAppender() { return new ComplexDataAppender(); }

ComplexArray apply(BiFunction<ComplexData, Complex, Complex> function) {
   ComplexDataReader in = getDataReader();
   // Initialise the capacity
   ComplexArray result = new ComplexArray(size);
   ComplexDataAppender out = result.getDataAppender();
   while (in.pos < size) {
       function.apply(in, out);
       in.pos += 2;
   }
   return result;
}


// Apply the function from above:
// public static ComplexData conjugate(ComplexData in, ToComplexData out)

ComplexArray a1 = ...;
ComplexArray a2 = a1.apply(Complex::conjugate);

This allows a user to pass a function that can do anything to the current data to create new real and imaginary components which are stored in a new array object. I do not know how this would effect performance without a full comparison of the API implemented with both concepts.

If the storage of the array of complex data were abstracted it may lead to simplification of ComplexUtils. Each of the methods to convert from multi-dimensional arrays of Complex can be replaced by a single method to extract the required data from the ComplexArray.

I can see at least the simple abstraction as useful. Performance improvements using in-place reading of the real and imaginary components for computations can be added later. It would be easier to do this if the ComplexArray is in the same package allowing package private methods to be added to Complex to do computations.

Alex


On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski <gillese...@gmail.com> wrote:

Hello.

The class "ComplexUtils" deal with multi-dimensional arrays that hold
instances of the "Complex" class.
I've recently encountered a use-case where it was pointed out that storing
many "Complex" instances (as seems the purpose of these utilities) is quite
inefficient memory-wise as each instance will take 32 bytes[1] while the
real and imaginary parts would only take 16 bytes as 2 primitive "double"s.
This is compounded by multi-dimensional array where each sub-dimensional
element is an array object (and thus takes another additional 16 bytes).
For example, a
     double[10][5][4]
would take
     16 * (1 + 10 * 5) + 10 * 5 * 4 * 8
   = 2416 bytes.
Assuming that in the above array, the last dimension holds 2 complex
numbers,
the same data can be represented as
     Complex[10][5][2]
that would take
     16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8
   = 4016 bytes.
In both cases, the payload (10 * 5 * 2 complex numbers) is
     10 * 5 * 2 * 2 * 8
   = 1600 bytes.
If stored in a one-dimensional array, the size in memory would be 1616
bytes.
Thus in the case of a data cube holding 100 complex numbers, a 3D array
takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D
array.
If this is correct, I wonder whether we should advertise such a
"ComplexUtils"
class.  It would perhaps be preferable to propose a data cube
abstraction.[2]

WDYT?

Regards,
Gilles

[1] https://www.baeldung.com/java-size-of-object
[2] Based on
http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


Reply via email to