Hello, first of all, I was the author of this very usefull statement on factories... Very constructive indeed. > > However it also shows that the improvement is only ~13% instead of the ~30% > reported by the benchmark in the paper... > could it be that their "naive" implementation as a 2D array is very naive indeed? I notice in the listings provided in the paper that they constantly refer to a[i][j]. I think the strength of having a row representation is to define a temporary variable ai = a[i], and access to a[i][j] as ai[j]. That's what is done in CM anyway, maybe that explains why the gain is not so big in the end. > > I don't think that CM development should be focused on performance > improvements that are so sensitive to the actual hardware (if it's indeed > the varying amount of CPU cache that is responsible for this discrepancy). > That would apparently require fine tuning indeed, just like BLAS itself, which has -I believe- specific implementations for specific architectures. So it's a bit going against the philosophy of Java. I wonder how a JNI interface to BLAS would perform ? That would leave the architecture specific issues out of the Java code (which could even provide a basic implementation of basic linear algebra operations if people do not want to use native code. > > If there are (human) resources inclined to rewrite CM algorithms in order to > boost performance, I'd suggest to also explore the multi-threading route, as > I feel that the type of optimizations described in this paper are more in the > realm of the JVM itself. > I would be very interested, but know nothing on multi-threading. I will need to explore multi-threading for work anyway, so maybe in the future? In the meantime, may I bring to you attention the JTransforms library? (http://sites.google.com/site/piotrwendykier/Home) It's a multi-threaded library for various FFT calculations. I've used it a lot, and have been involved in the correction of some bugs. I've never benchmarked it against CM, but the site claims (if my memory does not fail me) greater performance. Also it can handle non-power-of-two array dimensions. Plus, the author seems to have no longer time to spend on this library, and may be willing to share it with CM. That would be a first step in the multi-threading realm.
Beware, though; the basic code is a direct translation of C code, and is sometimes difficult to read (thousands of lines, with loads of branching: code coverage analysis was simply a nightmare!). Sébastien --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org