Hi Russell,
If I understand your workload, and your "next generation" model (~5M
tasks) still requires ~40 hours to process on a single machine, then
your tasks are on average 28.8 ms / task (40 hours / 5M tasks)! Please
correct me if I misunderstood your workload characteristics. With task
lengths in this range, you are looking at dispatch and execution rates
of about 34.72 tasks/sec per node (1000 ms / 28.8 ms/tasks). If you now
have 100 nodes, you need 3472 tasks/sec overall system throughput to
keep all 100 nodes busy with 28.8 ms tasks. Typical production LRM's
(local resource manager) throughput are in the ~1 job/sec range, and
development version of these LRM's are pushing 10~20 jobs/sec.
Our own work within the project Falkon
(http://people.cs.uchicago.edu/~iraicu/research/Falkon/) works with
existing LRMs and has achieved rates in the ~500 tasks/sec range. We
have also scaled Falkon to 2M queued tasks with 1.5GB of memory, and
should scale to your workload size of 5M with a linear and proportional
increase in memory. We are working now to improve the throughput
further by parallelizing the Falkon architecture! BTW, Falkon is
implemented in Java, and is using the Globus Toolkit 4. We have not
tested it on Windows, but there is nothing inherent that would stop it
from working in a Windows environment (with the exception of some
scripts perhaps).
Feel free to write me off-list if you have more Falkon specific questions.
Ioan
Russell Miles wrote:
I am a Database Admin in a Metropolitan Planning Organization,
therefore we process many complex, resource-intensive models focusing
on things like transportation and air quality. We are planning for the
next generation of modeling technology and wish to incorporate
distributed computing into the mix. We desire this since the current
models can take up to 40 hours to process on a single
workstation/server. The "next generation" model will consist of around
5 million independent tasks that will come together once all of the
tasks are completed. We wish to spread this processing over the 100 or
so PCs we have in the office, utilizing their idle CPU time.
I'm looking for some very specific advice, but all the information you
can give would be much appreciated.
1) We're trying to decide what language to develop our models in to
most easily coexist with grid computing code. My research has shown
that Java and .NET are the two most widely used grid computing bases.
Which do you all recommend? Or, is there some other technology that
you recommend?
2) What third party package, open source package, or other software
would you recommend to most efficiently implement this solution
focusing on performance? My research has shown that Digipede, Platform
Computing, and Alchemi are some of the more popular grid computing
platforms that work in Windows....what do you think about these? We
are open to Linux/UNIX as well, but for ease of implementation,
Windows is what we're currently running.
I appreciate any info you can provide and look forward to hearing back
from you,
Russell
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: [EMAIL PROTECTED]
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================