On Wed, Feb 6, 2013 at 12:03 PM, Gilles <gil...@harfang.homelinux.org>wrote:
> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote: > >> On 2/5/13 6:08 AM, Gilles wrote: >> >>> Hi. >>> >>> In the thread about "static import", Stephen noted that decisions >>> on a >>> component's evolution are dependent on whether the future of the Java >>> language is taken into account, or not. >>> A question on the same theme also arose after the presentation of >>> Commons >>> Math in FOSDEM 2013. >>> >>> If we assume that efficiency is among the important qualities for >>> Commons >>> Math, the future is to allow usage of the tools provided by the >>> standard >>> Java library in order to ease the development of multi-threaded >>> algorithms. >>> >>> Maintaining Java 1.5 source compatibility for the reason that we >>> may need >>> to support legacy applications will turn out to be self-defeating: >>> 1. New users will not consider Commons Math's features that are >>> notably >>> apt to parallel processing. >>> 2. Current users might at some point simply switch to another >>> library if >>> it proves more efficient (because it actually uses >>> multi-threading). >>> 3. New Java developers will be turned away because they will want >>> to use >>> the more convenient features of the language in order to provide >>> potential contributions. >>> >>> If maintaining 1.5 source compatibility is kept as a requirement, the >>> consequence is that Commons Math will _become_ a legacy library. >>> In that perspective, implementing/improving algorithms for which a >>> parallel version is known to be more efficient is plainly a waste of >>> development and maintenance time. >>> >>> In order to mitigate the risks (both of upgrading and of not >>> upgrading >>> the source compatibility requirement), I would propose to create a >>> new >>> project (say, "Commons Math MT") where we could implement new >>> features[1] >>> without being encumbered with the 1.5 requirement.[2] >>> The "Commons Math MT" would depend on "Commons Math" where we would >>> continue developing single-thread (and thread-safe) "tasks", i.e. >>> independent units of processing that could be used in algorithms >>> located in "Commons Math MT". >>> >>> In summary: >>> - Commons Math (as usual): >>> * single-thread (sequential) algorithms, >>> * (pure) Java 5, >>> * no dependencies. >>> - Commons Math MT: >>> * multi-thread (parallel) algorithms, >>> * Java 7 and beyond, >>> * JNI allowed, >>> * dependencies allowed (jCuda). >>> >>> What do you think? >>> >> >> There are several other possibilities to consider: >> >> 0) Implement multithreading using JDK 1.5 primitives >> 1) Set things up within [math] to support parallel execution in JDK >> 1.7, Hadoop or other frameworks >> 2) Instead of a new project, start a 4.x branch targeting JDK 1.7 >> >> I think we should maintain a version that has no dependencies and no >> JNI in any case. >> >> Starting a branch and getting concrete about how to parallelize some >> algorithms would be a good way to start. One thing I have not >> really investigated and would be interested in details on is what >> you actually get in efficiency gain (or loss?) using fork / join vs >> just using 1.5+ concurrency for the kinds of problems we would end >> up using this stuff for. >> >> Thinking about specific parallelization problem instances would also >> help decide whether 1) makes sense (i.e., whether it makes sense as >> you mention above to maintain a single-threaded library that >> provides task execution for a multithreaded version or multithreaded >> frameworks). >> >> One more thing to consider is that for at least some users of >> [math], having the library internally spawn threads and/or peg >> multiple processors may not be desirable. It is a little misleading >> to say that multithreading is the way to get "efficiency." It is >> really the way to *use* more compute resources and unless there are >> real algorithmic improvements, the overall efficiency may actually >> be less, due to task coordination overhead. What you get is faster >> execution due to more greedy utilization of available cores. Actual >> efficiency (how much overall compute resource it takes to complete a >> job) partly depends on how efficiently the coordination itself is >> done (which JDK 1.7 claims to do very well - I have just not seen >> substantiation or any benchmarks demonstrating this) and how the >> parallelization effects overall compute requirements. In any case, >> for environments where library thread-spawning is not desirable, I >> think we should maintain a single-threaded version. >> >> > Unless I missed the point, those reasons are exactly why I propose to > have 2 projects/components. One, "Commons-Math", does not fiddle with > resources, while the other would provide a "parallelizationLevel" > setting for the algorithms written to possibly take advantage of the > Java 5+ "task framework". > > Yes, we could still be good by using only Java 5's concurrency features > but the issue I raise is not only about concurrency but about > evolution/progress/**maintenance, all things that require raising interest > from new contributors (unless it's fine that Commons Math be tagged as a > "library of the past"...). > > But using concurrency features in "Commons Math" would also contradict > your own point ("we should maintain a single-threaded version"): I agree, > and that's why I proposed this other project... > > As for efficiency (or faster execution, if you want), I don't see the > point in doubting that tasks like global search (e.g. in a genetic > algorithm) will complete in less time when run in parallel... > > As I summarized previously, having a "Commons Math MT" would bring no > inconvenience, contrary to either your points 0, 1, or 2. [No > inconvenience to me, that is, but to people with requirements like > "Java 5 compatible" or "no multi-threading"). > As I indicated, the basic "task" could be defined in "Commons Math" and > "Commons Math MT" would provide the parallelization "glue" (e.g. to divide > the search space of the GA). > What about having the MT pieces in a .mt package (or .mt subpackages) ? WRT "divide the search space of the GA", I would think that having it lumped in with the main project would help have more releases more often. In [commons] in general it seems painful than to make releases, so many steps and bits, so more projects, more pain. Gary > > > Gilles > > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > dev-unsubscribe@commons.**apache.org<dev-unsubscr...@commons.apache.org> > For additional commands, e-mail: dev-h...@commons.apache.org > > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0 Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory