2011/10/15 Gilles Sadowski <gil...@harfang.homelinux.org>: > Hi. > >> first of all, I was the author of this very usefull statement on >> factories... Very constructive indeed. > > Liking something or not is an impression that could well be justified > afterwards. It also pushes to look for arguments that ascertain the > feeling. ;-) > The only problem being I'm not sure I'm comfortable enough (if at all) to state such an argument...
>> > >> > However it also shows that the improvement is only ~13% instead of the ~30% >> > reported by the benchmark in the paper... >> > >> could it be that their "naive" implementation as a 2D array is very >> naive indeed? I notice in the listings provided in the paper that they >> constantly refer to a[i][j]. I think the strength of having a row >> representation is to define a temporary variable ai = a[i], and access >> to a[i][j] as ai[j]. That's what is done in CM anyway, maybe that >> explains why the gain is not so big in the end. > > You are right; the "naïve" code repeatedly access a[i][j]. > > But this alone doesn't make up for the difference (cf. table below). > > operate (calls per timed block: 10000, timed blocks: 100, time unit: ms) > name time/call std error total time ratio difference > Commons Math 1.19770542e-01 2.85011660e-04 1.1977e+05 1.0000e+00 > 0.00000000e+00 > OpenGamma naive 1.23798907e-01 4.01495625e-04 1.2380e+05 1.0336e+00 > 4.02836495e+03 > OpenGamma 1D 1.04352827e-01 2.08970600e-04 1.0435e+05 8.7127e-01 > -1.54177153e+04 > OpenGamma 2D 1.12666770e-01 3.50012912e-04 1.1267e+05 9.4069e-01 > -7.10377213e+03 > > >> > >> > I don't think that CM development should be focused on performance >> > improvements that are so sensitive to the actual hardware (if it's indeed >> > the varying amount of CPU cache that is responsible for this discrepancy). >> > >> That would apparently require fine tuning indeed, just like BLAS >> itself, which has -I believe- specific implementations for specific >> architectures. So it's a bit going against the philosophy of Java. I >> wonder how a JNI interface to BLAS would perform ? That would leave >> the architecture specific issues out of the Java code (which could >> even provide a basic implementation of basic linear algebra operations >> if people do not want to use native code. > > The author of the paper proposes to indeed clone the BLAS tuning > methodology. > However, I don't think that this should be a priority for CM (as a > general-purpose math toolbox). > I agree with you, and I don't think that what the author proposes is a viable solution. I personnaly would be more in favor of reusing a well established, low level library, together with a well-designed, high-level, Java interface. But that's just a vague feeling, and I'm certainly not saying that this should be considered (especially considering the definition of the CM project on the web site "Limited dependencies. No external dependencies beyond Commons components and the core Java platform"). I just feel that the optimizations the author is willing to implement require a deep knowledge on how your CPU work, while my understanding of the CM philosophy is to focus on more high level, more mathematical, hardware independent algorithms. Finally, since BLAS has several different optimized versions for different platforms, I can't see how a unique Java library could hope to be optimized for all platforms. So even pure java implementations of the BLAS would require platform specific tuning, or am I wrong? Now, as a user of CM, I would like to say that speed is not always a concern. I know others have other requirements, but my simulations usually run during several days. So whether it takes 1 day or 1.6 days still requires appropriate organization of my workflow... Reliability of the code, ease of use are more a concern to me. >> > >> > If there are (human) resources inclined to rewrite CM algorithms in order >> > to >> > boost performance, I'd suggest to also explore the multi-threading route, >> > as >> > I feel that the type of optimizations described in this paper are more in >> > the >> > realm of the JVM itself. >> > >> I would be very interested, but know nothing on multi-threading. I >> will need to explore multi-threading for work anyway, so maybe in the >> future? > > Yes, 3.1, 3.2, ... , 4.0, ... whatever. > Which is consistent with Phil's objection: let's focus on more specific issues... >> In the meantime, may I bring to you attention the JTransforms >> library? (http://sites.google.com/site/piotrwendykier/Home) >> It's a multi-threaded library for various FFT calculations. I've used >> it a lot, and have been involved in the correction of some bugs. I've >> never benchmarked it against CM, but the site claims (if my memory >> does not fail me) greater performance. > > Yes, I did not perform benchmarks; however, Luc already pointed out that he > had not pay particular attention to the speed efficiency of the code in CM. > Also, there are other problems, cf. issue > https://issues.apache.org/jira/browse/MATH-677 > >> Also it can handle >> non-power-of-two array dimensions. Plus, the author seems to have no >> longer time to spend on this library, and may be willing to share it >> with CM. That would be a first step in the multi-threading realm. > > Unfortunately, no; he doesn't want to donate his code. > >> Beware, though; the basic code is a direct translation of C code, and >> is sometimes difficult to read (thousands of lines, with loads of >> branching: code coverage analysis was simply a nightmare!). > > So, the above information is only half bad news! ;-) > > > Best, > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > I wasn't aware of this JIRA issue, and wasn't aware of its author not wanting to share it. Besides the non-power of two thingy, a feature I do like is the ability to perform real FFT (no need to have an array twice the size of the data, with zero imaginary part). I don't think the current CM implementation has that, it can be usefull when you deal with large 3d FFTs of real data. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org