Re: [Numbers] Arrays of "Complex" objects and RAM

Gilles Sadowski Wed, 06 Nov 2019 04:42:41 -0800

Hello.

Le mar. 5 nov. 2019 à 18:38, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
> On 05/11/2019 00:09, Eric Barnhill wrote:
> > That's interesting. The JTransforms library performs Fourier transforms
> > that can take complex input, output, or both. They do this with interleaved
> > double[] arrays, which I suppose is much more space efficient, and the
> > status of a number as real or imaginary is implicit by its location being
> > odd or even.
>
> The packing of multi-dimensional structures into a single array is
> common in image processing. It has computational efficiency when
> operating on the entire set of numbers.
>
> It is used a lot in image processing to pack 2D data as a single array.
> In this case even random access to the (x,y) index as:
>
> 2D: [x][y]
> 1D: [y * maxx + x]
>
> is comparable in speed as the later avoids the second array dereference.


I suspected that it might be the case.
If you have benchmark data that confirm it, then is there ever any
advantage to a multidimensional array?

> A single array has advantages for a copy operation as it requires a
> single .clone() method call. The disadvantage is that the ultimate size
> of the multi-dimension array is limited by the size of an array.

That's ~17 GB, and more than 1 billion complex numbers.
And if there are use-cases that require very large data cubes, then the
abstraction would allow to add further indirections.

> In the case of complex numbers the use of alternating indices to store
> real and imaginary allows the array to be easily resized and extended.
> Operations that process all positions only need to cache in memory a
> sub-section of the array.
>
> [re1, im1, re2, im2, re3, im3]
>
> The alternative is to store real and imaginary in a large block in
> either 1 array or 2 arrays:
>
> [re1, re2, re3, im1, im2, im3]
>
> [re1, re2, re3]
> [im1, im2, im3]
>
> This has advantages for operations that require either the real or the
> imaginary components on their own. It is a bit more work to extend the
> data as it requires two copies of the existing data and packing of the
> new values as the appropriate position.
>
> For small sizes the use of 2 arrays this would be less efficient memory
> wise. It would require some type of pointer to hold the pair. However it
> could store double the amount of complex numbers.
> >
> > The MultidimensionalCounter functionality you mention is for example known
> > in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of
> > certain functions which can improve performance. Some people swear by it. I
> > always found it an additional mental exercise that I didn't want to occupy
> > myself with unless I absolutely had to. So, maybe it makes sense to
> > "advertise" this approach like you say, but some users may just want the 3D
> > arrays for more rapid prototyping-ish applications.

As Alex has shown, the 1D representation is abstracted away: The data is
accessed as a multi-dimensional array and the user never needs to remember
that it is interleaved.

> >
> > I wonder if there isn't some streaming solution for this -- the numbers are
> > stored as an interleaved 1D array, but are streamed through a Complex
> > constructor before any needed operations are performed.
>
> My initial thought is not as double values. A DoubleStream processes
> single values. To collect the real and imaginary components would
> require that the stream can be processed two at a time.
>
> This structure would work:
>
> class ComplexArray {
>      // Must be even in length. Packed as (real, imaginary)
>      double[] data;
>      int size;
>
>      Stream<Complex> stream() {
>          return IntStream.range(0, size / 2)
>                          .mapToObj(i -> Complex.ofCartesian(data[i*2], 
> data[i*2+1]));
>      }
>
>      void add(Complex c) {
>          checkCapacity(2);
>          data[size++] = c.getReal();
>          data[size++] = c.getImaginary();
>      }

Perhaps "store" would be less confusing (?).

>
>      void combine(ComplexArray c) {
>          checkCapacity(c.size);
>          System.arraycopy(c.data, 0, data, size, c.size);
>          size += c.size;
>      }
>
>      private void checkCapacity(int length) {
>          final int minCapacity = size + length;
>          final int oldCapacity = data.length;
>          if (minCapacity > oldCapacity) {
>              int newCapacity = ((oldCapacity * 3) >>> 1) + 1;
>              if (newCapacity < minCapacity) {
>                  newCapacity = minCapacity;
>              }
>              data = Arrays.copyOf(data, newCapacity);
>          }
>      }
> }

To allow easy usage as a multidimensional array, the API should contain
the "MultidimensionalCounter"[1] or equivalent:

public class ComplexArray {
    private final MultidimensionalCounter counter;
    // ...

    public ComplexArray(int... dimensions) {
        counter = new MultidimensionalCounter(dimensions);
        data = new data[counter.getSize()];
    }

    /**
     * @param indices Storage location.
     * @return the complex number stored at the given location.
     */
    public Complex get(int... indices) {
        return data[counter.getCount(indices)];
    }
}

Or perhaps the data cube should be an additional abstraction on top of
"ComplexArray" (?).

By the way, I'd suggest to find a more descriptive name for "ComplexArray"
(since the 1D array is an "implementation detail").

> > And I guess that leads to my last question -- suppose someone wants to call
> > a trig function on a series of Complex numbers. Now let's imagine the
> > primitives have been stored in some maximally efficient way. It seems like,
> > to use any of the functionality in Complex, these numbers would have to be
> > unpacked, cast as Complex, operated on, then cast back to how they are
> > being stored. I wonder if that would prove to be more efficient in the end.

Anything will be more efficient than swapping when getting near to the max
memory limit.

> Probably not more efficient to create all the Complex instances. Ideally
> you would operate on the primitive data directly.

This could certainly go to the wish-list (with the use-case being the
"ComplexArray").

> Looking at what Complex is capable of doing it seems that there are a
> lot of operations that would have to be duplicated to allow operations
> to work on primitives. However you can create the API and implement it
> lazily using the above object, e.g.
>
> public static ComplexArray apply(ComplexArray array, UnaryOperator<Complex> 
> mapper) {
>      return array.stream()
>                  .map(mapper)
>                  .collect(ComplexArray::new, ComplexArray::add, 
> ComplexArray::combine);
> }
>
> public static ComplexArray log(ComplexArray array) {
>      return apply(Complex::log);
> }
>
> This does create a new Complex object for each operation function
> applied to the data.
>
> The other methods that accept another Complex as an argument would have
> to be handled differently as you cannot start two streams (for each
> ComplexArray) in parallel. These are add, divide, multiply, subtract. So
> this would require the computation be moved inside ComplexArray:
>
> ComplexArray apply(ComplexArray other, BiFunction<Complex, Complex, Complex> 
> function) {
>      // TODO: argument checks on size
>      return IntStream.range(0, size / 2)
>                      .mapToObj(i -> {
>                          Complex c1 = Complex.ofCartesian(data[i*2], 
> data[i*2+1]);
>                          Complex c2 = Complex.ofCartesian(other.data[i*2], 
> other.data[i*2+1]);
>                          return function.apply(c1, c2);
>                      }).collect(ComplexArray::new, ComplexArray::add, 
> ComplexArray::combine);
> }
>
> ComplexArray a1 = ...;
> ComplexArray a2 = ...;
> ComplexArray a3 = a1.apply(a2, Complex::add);
>
>
> Note: To avoid creating new objects for all computations would require a
> rewrite of Complex so that the computations are done by static methods
> that accept a supplier of real and imaginary components, compute the
> calculation and send the answer to a function that receives the new
> components, for example:
>
> interface ComplexData {
>      double getReal();
>      double getImaginary();
> }
>
> interface ToComplexData {
>      ComplexData create(double real, double imaginary);
> }
>
> // For example
> public static ComplexData conjugate(ComplexData in, ToComplexData out) {
>      return out.create(in.getReal(), -in.getImaginary());
> }
>
> // This would make the existing API method
> public Complex conjugate() {
>      return conjugate(this, Complex::ofCartesian);
> }
>
> This would allow the ComplexArray to provide an object that implements
> ComplexData to read the current datapoint and an object that implements
> ToComplexData to write back to the equivalent datapoint in a result
> adding something like this to the ComplexArray object using inner
> classes to read and write to the data array:
>
> class ComplexDataReader implements ComplexData {
>   int pos;
>   double getReal() { return data[pos*2]; }
>   double getImaginary() { return data[pos*2+1]; }
> }
>
> class ComplexDataAppender implements ToComplexData {
>      ComplexData create(double real, double imaginary) {
>          data[size++] = real;
>          data[size++] = imaginary;
>          return null;
>      }
> }
>
> ComplexDataReader getDataReader() { return new ComplexDataReader(); }
> ComplexDataAppender getDataAppender() { return new ComplexDataAppender(); }
>
> ComplexArray apply(BiFunction<ComplexData, Complex, Complex> function) {
>     ComplexDataReader in = getDataReader();
>     // Initialise the capacity
>     ComplexArray result = new ComplexArray(size);
>     ComplexDataAppender out = result.getDataAppender();
>     while (in.pos < size) {
>         function.apply(in, out);
>         in.pos += 2;
>     }
>     return result;
> }
>
>
> // Apply the function from above:
> // public static ComplexData conjugate(ComplexData in, ToComplexData out)
>
> ComplexArray a1 = ...;
> ComplexArray a2 = a1.apply(Complex::conjugate);
>
> This allows a user to pass a function that can do anything to the
> current data to create new real and imaginary components which are
> stored in a new array object. I do not know how this would effect
> performance without a full comparison of the API implemented with both
> concepts.
>
> If the storage of the array of complex data were abstracted it may lead
> to simplification of ComplexUtils. Each of the methods to convert from
> multi-dimensional arrays of Complex can be replaced by a single method
> to extract the required data from the ComplexArray.
>
> I can see at least the simple abstraction as useful.

+1

> Performance
> improvements using in-place reading of the real and imaginary components
> for computations can be added later.

+1

> It would be easier to do this if
> the ComplexArray is in the same package allowing package private methods
> to be added to Complex to do computations.

+1

> Alex

Regards,
Gilles

[1] http://commons.apache.org/proper/commons-math/apidocs/index.html

>
> >
> > On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski <gillese...@gmail.com> wrote:
> >
> >> Hello.
> >>
> >> The class "ComplexUtils" deal with multi-dimensional arrays that hold
> >> instances of the "Complex" class.
> >> I've recently encountered a use-case where it was pointed out that storing
> >> many "Complex" instances (as seems the purpose of these utilities) is quite
> >> inefficient memory-wise as each instance will take 32 bytes[1] while the
> >> real and imaginary parts would only take 16 bytes as 2 primitive "double"s.
> >> This is compounded by multi-dimensional array where each sub-dimensional
> >> element is an array object (and thus takes another additional 16 bytes).
> >> For example, a
> >>      double[10][5][4]
> >> would take
> >>      16 * (1 + 10 * 5) + 10 * 5 * 4 * 8
> >>    = 2416 bytes.
> >> Assuming that in the above array, the last dimension holds 2 complex
> >> numbers,
> >> the same data can be represented as
> >>      Complex[10][5][2]
> >> that would take
> >>      16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8
> >>    = 4016 bytes.
> >> In both cases, the payload (10 * 5 * 2 complex numbers) is
> >>      10 * 5 * 2 * 2 * 8
> >>    = 1600 bytes.
> >> If stored in a one-dimensional array, the size in memory would be 1616
> >> bytes.
> >> Thus in the case of a data cube holding 100 complex numbers, a 3D array
> >> takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D
> >> array.
> >> If this is correct, I wonder whether we should advertise such a
> >> "ComplexUtils"
> >> class.  It would perhaps be preferable to propose a data cube
> >> abstraction.[2]
> >>
> >> WDYT?
> >>
> >> Regards,
> >> Gilles
> >>
> >> [1] https://www.baeldung.com/java-size-of-object
> >> [2] Based on
> >> http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html
> >>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Numbers] Arrays of "Complex" objects and RAM

Reply via email to