Hello. > > [...] > > >> > I don't think that CM development should be focused on performance > >> > improvements that are so sensitive to the actual hardware (if it's indeed > >> > the varying amount of CPU cache that is responsible for this > >> > discrepancy). > >> > > >> That would apparently require fine tuning indeed, just like BLAS > >> itself, which has -I believe- specific implementations for specific > >> architectures. So it's a bit going against the philosophy of Java. I > >> wonder how a JNI interface to BLAS would perform ? That would leave > >> the architecture specific issues out of the Java code (which could > >> even provide a basic implementation of basic linear algebra operations > >> if people do not want to use native code. > > > > The author of the paper proposes to indeed clone the BLAS tuning > > methodology. > > However, I don't think that this should be a priority for CM (as a > > general-purpose math toolbox). > > > > I agree with you, and I don't think that what the author proposes is a > viable solution. I personnaly would be more in favor of reusing a well > established, low level library, together with a well-designed, > high-level, Java interface. But that's just a vague feeling, and I'm > certainly not saying that this should be considered (especially > considering the definition of the CM project on the web site "Limited > dependencies. No external dependencies beyond Commons components and > the core Java platform"). I just feel that the optimizations the > author is willing to implement require a deep knowledge on how your > CPU work, while my understanding of the CM philosophy is to focus on > more high level, more mathematical, hardware independent algorithms. > Finally, since BLAS has several different optimized versions for > different platforms, I can't see how a unique Java library could hope > to be optimized for all platforms. So even pure java implementations > of the BLAS would require platform specific tuning, or am I wrong?
IIUC the "Future work" section, the author means exactly that ("platform" comprising both CPU and JVM). > Now, as a user of CM, I would like to say that speed is not always a > concern. I know others have other requirements, but my simulations > usually run during several days. So whether it takes 1 day or 1.6 days > still requires appropriate organization of my workflow... Reliability > of the code, ease of use are more a concern to me. I share this view. > >> > > >> > If there are (human) resources inclined to rewrite CM algorithms in > >> > order to > >> > boost performance, I'd suggest to also explore the multi-threading > >> > route, as > >> > I feel that the type of optimizations described in this paper are more > >> > in the > >> > realm of the JVM itself. > >> > > >> I would be very interested, but know nothing on multi-threading. I > >> will need to explore multi-threading for work anyway, so maybe in the > >> future? > > > > Yes, 3.1, 3.2, ... , 4.0, ... whatever. > > > > Which is consistent with Phil's objection: let's focus on more > specific issues... In the perspective of 3.0, I agree. However, the move towards taking advantage of multi-threading is of a general nature. > >> In the meantime, may I bring to you attention the JTransforms > >> library? (http://sites.google.com/site/piotrwendykier/Home) > >> It's a multi-threaded library for various FFT calculations. I've used > >> it a lot, and have been involved in the correction of some bugs. I've > >> never benchmarked it against CM, but the site claims (if my memory > >> does not fail me) greater performance. > > > > Yes, I did not perform benchmarks; however, Luc already pointed out that he > > had not pay particular attention to the speed efficiency of the code in CM. > > Also, there are other problems, cf. issue > > https://issues.apache.org/jira/browse/MATH-677 > > > >> Also it can handle > >> non-power-of-two array dimensions. Plus, the author seems to have no > >> longer time to spend on this library, and may be willing to share it > >> with CM. That would be a first step in the multi-threading realm. > > > > Unfortunately, no; he doesn't want to donate his code. > > > >> Beware, though; the basic code is a direct translation of C code, and > >> is sometimes difficult to read (thousands of lines, with loads of > >> branching: code coverage analysis was simply a nightmare!). > > > > So, the above information is only half bad news! ;-) > > > I wasn't aware of this JIRA issue, and wasn't aware of its author not > wanting to share it. Besides the non-power of two thingy, a feature I > do like is the ability to perform real FFT (no need to have an array > twice the size of the data, with zero imaginary part). I don't think > the current CM implementation has that, it can be usefull when you > deal with large 3d FFTs of real data. Maybe that you could open a specific issue (or add comments to issue MATH-677) for the usage shortcomings which you've spotted in the current code. This would help keep track of all the things which a redesign should address. Thanks, Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org