Hello. Le mar. 5 nov. 2019 à 18:38, Alex Herbert <alex.d.herb...@gmail.com> a écrit : > > On 05/11/2019 00:09, Eric Barnhill wrote: > > That's interesting. The JTransforms library performs Fourier transforms > > that can take complex input, output, or both. They do this with interleaved > > double[] arrays, which I suppose is much more space efficient, and the > > status of a number as real or imaginary is implicit by its location being > > odd or even. > > The packing of multi-dimensional structures into a single array is > common in image processing. It has computational efficiency when > operating on the entire set of numbers. > > It is used a lot in image processing to pack 2D data as a single array. > In this case even random access to the (x,y) index as: > > 2D: [x][y] > 1D: [y * maxx + x] > > is comparable in speed as the later avoids the second array dereference.
I suspected that it might be the case. If you have benchmark data that confirm it, then is there ever any advantage to a multidimensional array? > A single array has advantages for a copy operation as it requires a > single .clone() method call. The disadvantage is that the ultimate size > of the multi-dimension array is limited by the size of an array. That's ~17 GB, and more than 1 billion complex numbers. And if there are use-cases that require very large data cubes, then the abstraction would allow to add further indirections. > In the case of complex numbers the use of alternating indices to store > real and imaginary allows the array to be easily resized and extended. > Operations that process all positions only need to cache in memory a > sub-section of the array. > > [re1, im1, re2, im2, re3, im3] > > The alternative is to store real and imaginary in a large block in > either 1 array or 2 arrays: > > [re1, re2, re3, im1, im2, im3] > > [re1, re2, re3] > [im1, im2, im3] > > This has advantages for operations that require either the real or the > imaginary components on their own. It is a bit more work to extend the > data as it requires two copies of the existing data and packing of the > new values as the appropriate position. > > For small sizes the use of 2 arrays this would be less efficient memory > wise. It would require some type of pointer to hold the pair. However it > could store double the amount of complex numbers. > > > > The MultidimensionalCounter functionality you mention is for example known > > in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of > > certain functions which can improve performance. Some people swear by it. I > > always found it an additional mental exercise that I didn't want to occupy > > myself with unless I absolutely had to. So, maybe it makes sense to > > "advertise" this approach like you say, but some users may just want the 3D > > arrays for more rapid prototyping-ish applications. As Alex has shown, the 1D representation is abstracted away: The data is accessed as a multi-dimensional array and the user never needs to remember that it is interleaved. > > > > I wonder if there isn't some streaming solution for this -- the numbers are > > stored as an interleaved 1D array, but are streamed through a Complex > > constructor before any needed operations are performed. > > My initial thought is not as double values. A DoubleStream processes > single values. To collect the real and imaginary components would > require that the stream can be processed two at a time. > > This structure would work: > > class ComplexArray { > // Must be even in length. Packed as (real, imaginary) > double[] data; > int size; > > Stream<Complex> stream() { > return IntStream.range(0, size / 2) > .mapToObj(i -> Complex.ofCartesian(data[i*2], > data[i*2+1])); > } > > void add(Complex c) { > checkCapacity(2); > data[size++] = c.getReal(); > data[size++] = c.getImaginary(); > } Perhaps "store" would be less confusing (?). > > void combine(ComplexArray c) { > checkCapacity(c.size); > System.arraycopy(c.data, 0, data, size, c.size); > size += c.size; > } > > private void checkCapacity(int length) { > final int minCapacity = size + length; > final int oldCapacity = data.length; > if (minCapacity > oldCapacity) { > int newCapacity = ((oldCapacity * 3) >>> 1) + 1; > if (newCapacity < minCapacity) { > newCapacity = minCapacity; > } > data = Arrays.copyOf(data, newCapacity); > } > } > } To allow easy usage as a multidimensional array, the API should contain the "MultidimensionalCounter"[1] or equivalent: public class ComplexArray { private final MultidimensionalCounter counter; // ... public ComplexArray(int... dimensions) { counter = new MultidimensionalCounter(dimensions); data = new data[counter.getSize()]; } /** * @param indices Storage location. * @return the complex number stored at the given location. */ public Complex get(int... indices) { return data[counter.getCount(indices)]; } } Or perhaps the data cube should be an additional abstraction on top of "ComplexArray" (?). By the way, I'd suggest to find a more descriptive name for "ComplexArray" (since the 1D array is an "implementation detail"). > > And I guess that leads to my last question -- suppose someone wants to call > > a trig function on a series of Complex numbers. Now let's imagine the > > primitives have been stored in some maximally efficient way. It seems like, > > to use any of the functionality in Complex, these numbers would have to be > > unpacked, cast as Complex, operated on, then cast back to how they are > > being stored. I wonder if that would prove to be more efficient in the end. Anything will be more efficient than swapping when getting near to the max memory limit. > Probably not more efficient to create all the Complex instances. Ideally > you would operate on the primitive data directly. This could certainly go to the wish-list (with the use-case being the "ComplexArray"). > Looking at what Complex is capable of doing it seems that there are a > lot of operations that would have to be duplicated to allow operations > to work on primitives. However you can create the API and implement it > lazily using the above object, e.g. > > public static ComplexArray apply(ComplexArray array, UnaryOperator<Complex> > mapper) { > return array.stream() > .map(mapper) > .collect(ComplexArray::new, ComplexArray::add, > ComplexArray::combine); > } > > public static ComplexArray log(ComplexArray array) { > return apply(Complex::log); > } > > This does create a new Complex object for each operation function > applied to the data. > > The other methods that accept another Complex as an argument would have > to be handled differently as you cannot start two streams (for each > ComplexArray) in parallel. These are add, divide, multiply, subtract. So > this would require the computation be moved inside ComplexArray: > > ComplexArray apply(ComplexArray other, BiFunction<Complex, Complex, Complex> > function) { > // TODO: argument checks on size > return IntStream.range(0, size / 2) > .mapToObj(i -> { > Complex c1 = Complex.ofCartesian(data[i*2], > data[i*2+1]); > Complex c2 = Complex.ofCartesian(other.data[i*2], > other.data[i*2+1]); > return function.apply(c1, c2); > }).collect(ComplexArray::new, ComplexArray::add, > ComplexArray::combine); > } > > ComplexArray a1 = ...; > ComplexArray a2 = ...; > ComplexArray a3 = a1.apply(a2, Complex::add); > > > Note: To avoid creating new objects for all computations would require a > rewrite of Complex so that the computations are done by static methods > that accept a supplier of real and imaginary components, compute the > calculation and send the answer to a function that receives the new > components, for example: > > interface ComplexData { > double getReal(); > double getImaginary(); > } > > interface ToComplexData { > ComplexData create(double real, double imaginary); > } > > // For example > public static ComplexData conjugate(ComplexData in, ToComplexData out) { > return out.create(in.getReal(), -in.getImaginary()); > } > > // This would make the existing API method > public Complex conjugate() { > return conjugate(this, Complex::ofCartesian); > } > > This would allow the ComplexArray to provide an object that implements > ComplexData to read the current datapoint and an object that implements > ToComplexData to write back to the equivalent datapoint in a result > adding something like this to the ComplexArray object using inner > classes to read and write to the data array: > > class ComplexDataReader implements ComplexData { > int pos; > double getReal() { return data[pos*2]; } > double getImaginary() { return data[pos*2+1]; } > } > > class ComplexDataAppender implements ToComplexData { > ComplexData create(double real, double imaginary) { > data[size++] = real; > data[size++] = imaginary; > return null; > } > } > > ComplexDataReader getDataReader() { return new ComplexDataReader(); } > ComplexDataAppender getDataAppender() { return new ComplexDataAppender(); } > > ComplexArray apply(BiFunction<ComplexData, Complex, Complex> function) { > ComplexDataReader in = getDataReader(); > // Initialise the capacity > ComplexArray result = new ComplexArray(size); > ComplexDataAppender out = result.getDataAppender(); > while (in.pos < size) { > function.apply(in, out); > in.pos += 2; > } > return result; > } > > > // Apply the function from above: > // public static ComplexData conjugate(ComplexData in, ToComplexData out) > > ComplexArray a1 = ...; > ComplexArray a2 = a1.apply(Complex::conjugate); > > This allows a user to pass a function that can do anything to the > current data to create new real and imaginary components which are > stored in a new array object. I do not know how this would effect > performance without a full comparison of the API implemented with both > concepts. > > If the storage of the array of complex data were abstracted it may lead > to simplification of ComplexUtils. Each of the methods to convert from > multi-dimensional arrays of Complex can be replaced by a single method > to extract the required data from the ComplexArray. > > I can see at least the simple abstraction as useful. +1 > Performance > improvements using in-place reading of the real and imaginary components > for computations can be added later. +1 > It would be easier to do this if > the ComplexArray is in the same package allowing package private methods > to be added to Complex to do computations. +1 > Alex Regards, Gilles [1] http://commons.apache.org/proper/commons-math/apidocs/index.html > > > > > On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski <gillese...@gmail.com> wrote: > > > >> Hello. > >> > >> The class "ComplexUtils" deal with multi-dimensional arrays that hold > >> instances of the "Complex" class. > >> I've recently encountered a use-case where it was pointed out that storing > >> many "Complex" instances (as seems the purpose of these utilities) is quite > >> inefficient memory-wise as each instance will take 32 bytes[1] while the > >> real and imaginary parts would only take 16 bytes as 2 primitive "double"s. > >> This is compounded by multi-dimensional array where each sub-dimensional > >> element is an array object (and thus takes another additional 16 bytes). > >> For example, a > >> double[10][5][4] > >> would take > >> 16 * (1 + 10 * 5) + 10 * 5 * 4 * 8 > >> = 2416 bytes. > >> Assuming that in the above array, the last dimension holds 2 complex > >> numbers, > >> the same data can be represented as > >> Complex[10][5][2] > >> that would take > >> 16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8 > >> = 4016 bytes. > >> In both cases, the payload (10 * 5 * 2 complex numbers) is > >> 10 * 5 * 2 * 2 * 8 > >> = 1600 bytes. > >> If stored in a one-dimensional array, the size in memory would be 1616 > >> bytes. > >> Thus in the case of a data cube holding 100 complex numbers, a 3D array > >> takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D > >> array. > >> If this is correct, I wonder whether we should advertise such a > >> "ComplexUtils" > >> class. It would perhaps be preferable to propose a data cube > >> abstraction.[2] > >> > >> WDYT? > >> > >> Regards, > >> Gilles > >> > >> [1] https://www.baeldung.com/java-size-of-object > >> [2] Based on > >> http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html > >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org