Hello,
first of all, I was the author of this very usefull statement on
factories... Very constructive indeed.
>
> However it also shows that the improvement is only ~13% instead of the ~30%
> reported by the benchmark in the paper...
>
could it be that their "naive" implementation as a 2D array is very
naive indeed? I notice in the listings provided in the paper that they
constantly refer to a[i][j]. I think the strength of having a row
representation is to define a temporary variable ai = a[i], and access
to a[i][j] as ai[j]. That's what is done in CM anyway, maybe that
explains why the gain is not so big in the end.
>
> I don't think that CM development should be focused on performance
> improvements that are so sensitive to the actual hardware (if it's indeed
> the varying amount of CPU cache that is responsible for this discrepancy).
>
That would apparently require fine tuning indeed, just like BLAS
itself, which has -I believe- specific implementations for specific
architectures. So it's a bit going against the philosophy of Java. I
wonder how a JNI interface to BLAS would perform ? That would leave
the architecture specific issues out of the Java code (which could
even provide a basic implementation of basic linear algebra operations
if people do not want to use native code.
>
> If there are (human) resources inclined to rewrite CM algorithms in order to
> boost performance, I'd suggest to also explore the multi-threading route, as
> I feel that the type of optimizations described in this paper are more in the
> realm of the JVM itself.
>
I would be very interested, but know nothing on multi-threading. I
will need to explore multi-threading for work anyway, so maybe in the
future? In the meantime, may I bring to you attention the JTransforms
library? (http://sites.google.com/site/piotrwendykier/Home)
It's a multi-threaded library for various FFT calculations. I've used
it a lot, and have been involved in the correction of some bugs. I've
never benchmarked it against CM, but the site claims (if my memory
does not fail me) greater performance. Also it can handle
non-power-of-two array dimensions. Plus, the author seems to have no
longer time to spend on this library, and may be willing to share it
with CM. That would be a first step in the multi-threading realm.

Beware, though; the basic code is a direct translation of C code, and
is sometimes difficult to read (thousands of lines, with loads of
branching: code coverage analysis was simply a nightmare!).
Sébastien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to