Hi Russell,
If I understand your workload, and your "next generation" model (~5M tasks) still requires ~40 hours to process on a single machine, then your tasks are on average 28.8 ms / task (40 hours / 5M tasks)! Please correct me if I misunderstood your workload characteristics. With task lengths in this range, you are looking at dispatch and execution rates of about 34.72 tasks/sec per node (1000 ms / 28.8 ms/tasks). If you now have 100 nodes, you need 3472 tasks/sec overall system throughput to keep all 100 nodes busy with 28.8 ms tasks. Typical production LRM's (local resource manager) throughput are in the ~1 job/sec range, and development version of these LRM's are pushing 10~20 jobs/sec. Our own work within the project Falkon (http://people.cs.uchicago.edu/~iraicu/research/Falkon/) works with existing LRMs and has achieved rates in the ~500 tasks/sec range. We have also scaled Falkon to 2M queued tasks with 1.5GB of memory, and should scale to your workload size of 5M with a linear and proportional increase in memory. We are working now to improve the throughput further by parallelizing the Falkon architecture! BTW, Falkon is implemented in Java, and is using the Globus Toolkit 4. We have not tested it on Windows, but there is nothing inherent that would stop it from working in a Windows environment (with the exception of some scripts perhaps).
Feel free to write me off-list if you have more Falkon specific questions.

Ioan

Russell Miles wrote:

I am a Database Admin in a Metropolitan Planning Organization, therefore we process many complex, resource-intensive models focusing on things like transportation and air quality. We are planning for the next generation of modeling technology and wish to incorporate distributed computing into the mix. We desire this since the current models can take up to 40 hours to process on a single workstation/server. The "next generation" model will consist of around 5 million independent tasks that will come together once all of the tasks are completed. We wish to spread this processing over the 100 or so PCs we have in the office, utilizing their idle CPU time.

I'm looking for some very specific advice, but all the information you can give would be much appreciated.

1) We're trying to decide what language to develop our models in to most easily coexist with grid computing code. My research has shown that Java and .NET are the two most widely used grid computing bases. Which do you all recommend? Or, is there some other technology that you recommend?

2) What third party package, open source package, or other software would you recommend to most efficiently implement this solution focusing on performance? My research has shown that Digipede, Platform Computing, and Alchemi are some of the more popular grid computing platforms that work in Windows....what do you think about these? We are open to Linux/UNIX as well, but for ease of implementation, Windows is what we're currently running.

I appreciate any info you can provide and look forward to hearing back from you,

Russell


--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
      http://dsl.cs.uchicago.edu/
============================================
============================================

Reply via email to