Re: [Numbers] Arrays of "Complex" objects and RAM

Alex Herbert Wed, 06 Nov 2019 09:02:28 -0800


On 06/11/2019 12:41, Gilles Sadowski wrote:

Hello.


Le mar. 5 nov. 2019 à 18:38, Alex Herbert <alex.d.herb...@gmail.com> a écrit :

On 05/11/2019 00:09, Eric Barnhill wrote:

That's interesting. The JTransforms library performs Fourier transforms
that can take complex input, output, or both. They do this with interleaved
double[] arrays, which I suppose is much more space efficient, and the
status of a number as real or imaginary is implicit by its location being
odd or even.

The packing of multi-dimensional structures into a single array is
common in image processing. It has computational efficiency when
operating on the entire set of numbers.

It is used a lot in image processing to pack 2D data as a single array.
In this case even random access to the (x,y) index as:

2D: [x][y]
1D: [y * maxx + x]

is comparable in speed as the later avoids the second array dereference.

I suspected that it might be the case.
If you have benchmark data that confirm it, then is there ever any
advantage to a multidimensional array?

A single array has advantages for a copy operation as it requires a
single .clone() method call. The disadvantage is that the ultimate size
of the multi-dimension array is limited by the size of an array.

That's ~17 GB, and more than 1 billion complex numbers.
And if there are use-cases that require very large data cubes, then the
abstraction would allow to add further indirections.

In the case of complex numbers the use of alternating indices to store
real and imaginary allows the array to be easily resized and extended.
Operations that process all positions only need to cache in memory a
sub-section of the array.

[re1, im1, re2, im2, re3, im3]

The alternative is to store real and imaginary in a large block in
either 1 array or 2 arrays:

[re1, re2, re3, im1, im2, im3]

[re1, re2, re3]
[im1, im2, im3]

This has advantages for operations that require either the real or the
imaginary components on their own. It is a bit more work to extend the
data as it requires two copies of the existing data and packing of the
new values as the appropriate position.

For small sizes the use of 2 arrays this would be less efficient memory
wise. It would require some type of pointer to hold the pair. However it
could store double the amount of complex numbers.

The MultidimensionalCounter functionality you mention is for example known
in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of
certain functions which can improve performance. Some people swear by it. I
always found it an additional mental exercise that I didn't want to occupy
myself with unless I absolutely had to. So, maybe it makes sense to
"advertise" this approach like you say, but some users may just want the 3D
arrays for more rapid prototyping-ish applications.

As Alex has shown, the 1D representation is abstracted away: The data is
accessed as a multi-dimensional array and the user never needs to remember
that it is interleaved.

I wonder if there isn't some streaming solution for this -- the numbers are
stored as an interleaved 1D array, but are streamed through a Complex
constructor before any needed operations are performed.

My initial thought is not as double values. A DoubleStream processes
single values. To collect the real and imaginary components would
require that the stream can be processed two at a time.

This structure would work:

class ComplexArray {
      // Must be even in length. Packed as (real, imaginary)
      double[] data;
      int size;

      Stream<Complex> stream() {
          return IntStream.range(0, size / 2)
                          .mapToObj(i -> Complex.ofCartesian(data[i*2], 
data[i*2+1]));
      }

      void add(Complex c) {
          checkCapacity(2);
          data[size++] = c.getReal();
          data[size++] = c.getImaginary();
      }

Perhaps "store" would be less confusing (?).

Final API names should have some consensus vote. At the moment this is aprototype for functionality.

      void combine(ComplexArray c) {
          checkCapacity(c.size);
          System.arraycopy(c.data, 0, data, size, c.size);
          size += c.size;
      }

      private void checkCapacity(int length) {
          final int minCapacity = size + length;
          final int oldCapacity = data.length;
          if (minCapacity > oldCapacity) {
              int newCapacity = ((oldCapacity * 3) >>> 1) + 1;
              if (newCapacity < minCapacity) {
                  newCapacity = minCapacity;
              }
              data = Arrays.copyOf(data, newCapacity);
          }
      }
}

To allow easy usage as a multidimensional array, the API should contain
the "MultidimensionalCounter"[1] or equivalent:

public class ComplexArray {
     private final MultidimensionalCounter counter;
     // ...

     public ComplexArray(int... dimensions) {
         counter = new MultidimensionalCounter(dimensions);
         data = new data[counter.getSize()];
     }

     /**
      * @param indices Storage location.
      * @return the complex number stored at the given location.
      */
     public Complex get(int... indices) {
         return data[counter.getCount(indices)];
     }
}

Or perhaps the data cube should be an additional abstraction on top of
"ComplexArray" (?).

I think adding the abstraction on top is easier.

Array can be renamed to Collection/List? Thus the count is the onlything that matters, and optionally a constant time access get(int index)method.

This can later be reshaped to a Matrix representation using theMultidimensionalCounter pattern.


We have two use cases:

1. You know the final count of Complex objects
2. You do not know the count of Complex objects

Use case 2 is used in streams where the ComplexList must be expandablefor use as a collector. This can be satisfied with a constructor with aninitial capacity. The ComplexList would thus offer a specialisation ofArrayList<Complex> by storing the Complex objects using primitive array(s).

Use case 2 rules out the possibility of immutability. This is the typeof functionality under discussion and could be served using an interfaceto allow different implementations:


public interface ComplexList {
    /**
     * The size of complex numbers contained in the list.
     *
     * @return the size
     */
    int size();

    /**
     * Gets the complex number at the specified index.
     *
     * @param index the index
     * @return the complex
     * @throws IndexOutOfBoundsException if the index is out of range
     */
    Complex get(int index);

    /**
     * Gets the real component of the complex number at the specified index.
     *
     * @param index the index
     * @return the real component
     * @throws IndexOutOfBoundsException if the index is out of range
     */
    double getReal(int index);

    /**
     * Gets the imaginary component of the complex number at the specified 
index.
     *
     * @param index the index
     * @return the imaginary component
     * @throws IndexOutOfBoundsException if the index is out of range
     */
    double getImaginary(int index);

    /**
     * Sets the complex number at the specified index.
     *
     * @param index the index
     * @param complex the complex
     * @throws IndexOutOfBoundsException if the index is out of range
     */
    void set(int index, Complex complex);

    /**
     * Sets the complex number at the specified index.
     *
     * @param index the index
     * @param real the real component
     * @param imaginary the imaginary component
     * @throws IndexOutOfBoundsException if the index is out of range
     */
    void set(int index, double real, double imaginary);

    /**
     * Adds the complex number to the list.
     *
     * @param complex the complex
     */
    void add(Complex complex);

    /**
     * Adds the complex number to the list.
     *
     * @param real the real component
     * @param imaginary the imaginary component
     */
    void add(double real, double imaginary);

    /**
     * Stream the complex numbers.
     *
     * @return the stream
     */
    Stream<Complex> stream();

    /**
     * Apply the function to the complex numbers in the list to create a new 
list.
     *
     * @param function the function
     * @return the result
     */
    ComplexList apply(UnaryOperator<Complex> function);

    /**
     * Apply the function to the paired complex numbers in both lists to create 
a new list.
     *
     * @param other the other list (must be the same size)
     * @param function the function
     * @return the result
     */
    ComplexList apply(ComplexList other, BiFunction<Complex, Complex, Complex> 
function);
}

This could be separated to put the set and add methods in asub-interface allowing the top level interface to be an immutable object.

However the apply functions currently are specified using Complex.Efficiency would be gained using primitive specialisations for operatingon 1 or 2 complex numbers and outputting the results to a provided consumer:


    public interface ComplexProvider {
        /**
         * Gets the real component of the complex number.
         *
         * @return the real component
         */
        double getReal();

        /**
         * Gets the imaginary component of the complex number.
         *
         * @return the imaginary component
         */
        double getImaginary();
    }

    public interface ComplexConsumer {
        /**
         * Accept the components of the complex number.
         * Optionally return a new ComplexProvider with the components.
         *
         * @param real the real
         * @param imaginary the imaginary
         * @return the complex provider (or null)
         */
        ComplexProvider accept(double real, double imaginary);
    }

    public interface ComplexFunction {
        /**
         * Apply a function to the components of the complex number. The 
results are sent
         * to the consumer and the output from the consumer is returned.
         *
         * @param complex the complex number
         * @param consumer the consumer
         * @return the result produced by the complex consumer
         */
        ComplexProvider apply(ComplexProvider complex, ComplexConsumer 
consumer);

        /**
         * Returns a composed function that first applies this function to
         * its input, and then applies the {@code after} function to the result.
         * The same consumer is used for both function invocations thus it must 
return
         * a ComplexProvider.
         *
         * @param after the function to apply after this function is applied
         * @return a composed function that first applies this function and then
         * applies the {@code after} function
         */
        default ComplexFunction andThen(ComplexFunction after) {
            return (t, u) -> after.apply(apply(t, u), u);
        }
    }

    public interface BiComplexFunction {
        /**
         * Apply a function to the components of two complex numbers. The 
results are sent
         * to the consumer and the output from the consumer is returned.
         *
         * @param complex1 the first complex number
         * @param complex2 the second complex number
         * @param consumer the consumer
         * @return the result produced by the complex consumer
         */
        ComplexProvider apply(ComplexProvider complex1,
                              ComplexProvider complex2, ComplexConsumer 
consumer);
    }

    /**
     * Copy the list.
     *
     * @return the complex list
     */
    ComplexList copy();

    /**
     * Apply the function to the complex numbers in the list to create a new 
list.
     *
     * @param function the function
     * @return the result
     */
    default ComplexList apply(ComplexFunction function) {
        return copy().map(function);
    }

    /**
     * Apply the function to the complex numbers in the list in-place.
     *
     * @param function the function
     * @return the list
     */
    ComplexList map(ComplexFunction function);

    /**
     * Apply the function to the paired complex numbers in both lists to create 
a new list.
     *
     * @param other the other list (must be the same size)
     * @param function the function
     * @return the result
     */
    default ComplexList apply(ComplexList other, BiComplexFunction function) {
        return copy().map(other, function);
    }

    /**
     * Apply the function to the paired complex numbers in both lists to modify 
this list in-place.
     *
     * @param other the other list (must be the same size)
     * @param function the function
     * @return the list
     */
    ComplexList map(ComplexList other, BiComplexFunction function);

If Complex implemented ComplexProvider it would allow all of Complex tobe rewritten to have static ComplexFunction implementations which arecalled from the instance:


public static ComplexProvider conjugate(ComplexProvider in, ComplexConsumer 
out) {
    return out.accept(in.getReal(), -in.getImaginary());
}

public Complex conjugate() {
    return (Complex) conjugate(this, Complex::ofCartesian);
}

You can manipulate the list using:

ComplexList list = ...;
list.map(Complex::conjugate);

ComplexFunction f = Complex::conjugate;
f = f.andThen(Complex::negate);
ComplexList list2 = list.apply(f);

This is an only an idea. Digging through the code for Complex it seemsthat some functions are non-trivial and would not work well with thisdesign.

Provision of a Stream<Complex> would require that the primitives areused to create objects. This could be changed to Stream<ComplexProvider>using a spliterator implementation to prevent object creation for eachComplex; only the spliterator would require the current index in theunderlying storage:


    /**
     * Creates a {@link Spliterator} over the elements in this list.
     *
     * @return the spliterator
     */
    Spliterator<ComplexProvider> spliterator();

    /**
     * Stream the complex numbers.
     *
     * @return the stream
     */
    default Stream<ComplexProvider> stream() {
        return StreamSupport.stream(spliterator(), false);
    }

Unfortunately processing the stream as encountered may result in out oforder elements if parallelisation occurs so the results may not becollected back in the same order.

Here there are solutions to different problems. So which are we tryingto solve:


1. Efficient storage of large amounts of complex numbers.

2. Efficient computation on large amounts of complex numbers.

3. Custom computation on complex numbers.

4. Computation on large amounts of complex numbers in-place.


For 1 we can use the abstraction to a collection (e.g. ComplexList).

For 2 we can duplicate all methods in Complex to work directly on theunderlying data.

For 2/3 we can rewrite computation of Complex to work on input providersand output consumers and allow functions to be applied to the collection.


For 4 we require different mutability of the collection.

I will continue to think about this as the solution to all issues is notstraightforward.


Opinions are welcomed.


By the way, I'd suggest to find a more descriptive name for "ComplexArray"
(since the 1D array is an "implementation detail").

And I guess that leads to my last question -- suppose someone wants to call
a trig function on a series of Complex numbers. Now let's imagine the
primitives have been stored in some maximally efficient way. It seems like,
to use any of the functionality in Complex, these numbers would have to be
unpacked, cast as Complex, operated on, then cast back to how they are
being stored. I wonder if that would prove to be more efficient in the end.

Anything will be more efficient than swapping when getting near to the max
memory limit.

Probably not more efficient to create all the Complex instances. Ideally
you would operate on the primitive data directly.

This could certainly go to the wish-list (with the use-case being the
"ComplexArray").

Looking at what Complex is capable of doing it seems that there are a
lot of operations that would have to be duplicated to allow operations
to work on primitives. However you can create the API and implement it
lazily using the above object, e.g.

public static ComplexArray apply(ComplexArray array, UnaryOperator<Complex> 
mapper) {
      return array.stream()
                  .map(mapper)
                  .collect(ComplexArray::new, ComplexArray::add, 
ComplexArray::combine);
}

public static ComplexArray log(ComplexArray array) {
      return apply(Complex::log);
}

This does create a new Complex object for each operation function
applied to the data.

The other methods that accept another Complex as an argument would have
to be handled differently as you cannot start two streams (for each
ComplexArray) in parallel. These are add, divide, multiply, subtract. So
this would require the computation be moved inside ComplexArray:

ComplexArray apply(ComplexArray other, BiFunction<Complex, Complex, Complex> 
function) {
      // TODO: argument checks on size
      return IntStream.range(0, size / 2)
                      .mapToObj(i -> {
                          Complex c1 = Complex.ofCartesian(data[i*2], 
data[i*2+1]);
                          Complex c2 = Complex.ofCartesian(other.data[i*2], 
other.data[i*2+1]);
                          return function.apply(c1, c2);
                      }).collect(ComplexArray::new, ComplexArray::add, 
ComplexArray::combine);
}

ComplexArray a1 = ...;
ComplexArray a2 = ...;
ComplexArray a3 = a1.apply(a2, Complex::add);


Note: To avoid creating new objects for all computations would require a
rewrite of Complex so that the computations are done by static methods
that accept a supplier of real and imaginary components, compute the
calculation and send the answer to a function that receives the new
components, for example:

interface ComplexData {
      double getReal();
      double getImaginary();
}

interface ToComplexData {
      ComplexData create(double real, double imaginary);
}

// For example
public static ComplexData conjugate(ComplexData in, ToComplexData out) {
      return out.create(in.getReal(), -in.getImaginary());
}

// This would make the existing API method
public Complex conjugate() {
      return conjugate(this, Complex::ofCartesian);
}

This would allow the ComplexArray to provide an object that implements
ComplexData to read the current datapoint and an object that implements
ToComplexData to write back to the equivalent datapoint in a result
adding something like this to the ComplexArray object using inner
classes to read and write to the data array:

class ComplexDataReader implements ComplexData {
   int pos;
   double getReal() { return data[pos*2]; }
   double getImaginary() { return data[pos*2+1]; }
}

class ComplexDataAppender implements ToComplexData {
      ComplexData create(double real, double imaginary) {
          data[size++] = real;
          data[size++] = imaginary;
          return null;
      }
}

ComplexDataReader getDataReader() { return new ComplexDataReader(); }
ComplexDataAppender getDataAppender() { return new ComplexDataAppender(); }

ComplexArray apply(BiFunction<ComplexData, Complex, Complex> function) {
     ComplexDataReader in = getDataReader();
     // Initialise the capacity
     ComplexArray result = new ComplexArray(size);
     ComplexDataAppender out = result.getDataAppender();
     while (in.pos < size) {
         function.apply(in, out);
         in.pos += 2;
     }
     return result;
}


// Apply the function from above:
// public static ComplexData conjugate(ComplexData in, ToComplexData out)

ComplexArray a1 = ...;
ComplexArray a2 = a1.apply(Complex::conjugate);

This allows a user to pass a function that can do anything to the
current data to create new real and imaginary components which are
stored in a new array object. I do not know how this would effect
performance without a full comparison of the API implemented with both
concepts.

If the storage of the array of complex data were abstracted it may lead
to simplification of ComplexUtils. Each of the methods to convert from
multi-dimensional arrays of Complex can be replaced by a single method
to extract the required data from the ComplexArray.

I can see at least the simple abstraction as useful.

+1

Performance
improvements using in-place reading of the real and imaginary components
for computations can be added later.

+1

It would be easier to do this if
the ComplexArray is in the same package allowing package private methods
to be added to Complex to do computations.

+1

Alex

Regards,
Gilles

[1] http://commons.apache.org/proper/commons-math/apidocs/index.html

On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski <gillese...@gmail.com> wrote:

Hello.

The class "ComplexUtils" deal with multi-dimensional arrays that hold
instances of the "Complex" class.
I've recently encountered a use-case where it was pointed out that storing
many "Complex" instances (as seems the purpose of these utilities) is quite
inefficient memory-wise as each instance will take 32 bytes[1] while the
real and imaginary parts would only take 16 bytes as 2 primitive "double"s.
This is compounded by multi-dimensional array where each sub-dimensional
element is an array object (and thus takes another additional 16 bytes).
For example, a
      double[10][5][4]
would take
      16 * (1 + 10 * 5) + 10 * 5 * 4 * 8
    = 2416 bytes.
Assuming that in the above array, the last dimension holds 2 complex
numbers,
the same data can be represented as
      Complex[10][5][2]
that would take
      16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8
    = 4016 bytes.
In both cases, the payload (10 * 5 * 2 complex numbers) is
      10 * 5 * 2 * 2 * 8
    = 1600 bytes.
If stored in a one-dimensional array, the size in memory would be 1616
bytes.
Thus in the case of a data cube holding 100 complex numbers, a 3D array
takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D
array.
If this is correct, I wonder whether we should advertise such a
"ComplexUtils"
class.  It would perhaps be preferable to propose a data cube
abstraction.[2]

WDYT?

Regards,
Gilles

[1] https://www.baeldung.com/java-size-of-object
[2] Based on
http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Numbers] Arrays of "Complex" objects and RAM

Reply via email to