I think James' comments are very good here.
Depending on how much data you're going to send to it, SSH has a couple
of neat tricks.
Take for example the following command:
ssh [EMAIL PROTECTED] "remote_command" <local_infile.txt >local_outfile.txt
Data is read from the local_infile.txt passed to the standard input of
remote_command and output data is written to local_outfile.txt.
The trick is that remote_command needs to be able to process the data
from standard input and write the results to standard output.
Roughly, a general rule of thumb is that ssh can encrypt 2 megabytes of
data per second. So if your data set is larger than say a few megabytes
you'll want to do something else to get your data in to the process.
Interestingly enough, ssh also preserves standard error as well, giving
you a second data pipe for error and/or status. This can be captured by
appending " 2>local_errfile.txt" to the command above. (Provided of
course that your remote process makes use of standard error for
something useful.)
--R
On Mon, 2008-01-07 at 08:52 -0500, John McKelvey wrote:
>
>
> ---------- Forwarded message ----------
> From: John McKelvey <[EMAIL PROTECTED]>
> Date: Jan 7, 2008 8:41 AM
> Subject: Re: [fwlug] Linux clustering
> To: JAMES SCOTT <[EMAIL PROTECTED]>
>
>
> James,
>
> Many thanks for your comments! The one about crossmounting a
> directory "made some lights go on." Getting data back from other
> machines was going to be an issue. I think crossmounting reduces the
> problem to doing a remote procedure call to the other machine; I can
> have identical executables and run time libraries on each machine.
> Things are programmed entirely in fortran [ I know ... At one time I
> knew Algol.. :-) I started computing with that in 1965.
> Computational chemists do most all cpu intensive stuff in fortran, in
> the past often worrying in the past about things like the impact of
> file block size and disk track length on IO performance. ] I have
> found that the 'call system(" ")' command in fortran lets me do a lot
> of "command line" things easily.
>
> This all gets me to seeing better both the forest and the trees, but
> I'm sure I will need an additional suggestion or two.. please feel
> free to make other comments!
>
> Gratefully,
>
> John
>
>
>
> On Jan 7, 2008 12:17 AM, JAMES SCOTT <[EMAIL PROTECTED]> wrote:
> John,
>
> Rob's reply is a good starting point for what I think of as a
> command channel. The is a logical data channel that you can
> set up to complement the command and make data collection
> easier: NFS or shared disk. Simply enable nfs on both
> machines and 'export' (share) a directory from one, then
> 'mount' (use) it from the other; they now have a single
> directory in common ( i.e. the data channel is established).
>
> With a data channel in place you could write 'bash' scripts to
> query a 'new-work' file and execute any found commands. Be
> sure to add some type of queuing or locking mechanism to
> prevent nodes from reading the work-file while the main wkstn
> is adding new commands to the file. Tell me more about the
> work steps and I might be interested in writing the scripts,
> or at least getting you started.
>
> As I'm sure you know there are options available for setting
> up a cluster. How much change are you willing to impose on
> the machines current configurations? I.E. a true
> (tightly-coupled) cluster configuration would limit these
> machine general purpose usage. The suggested use of a data &
> command channel is comparable to loosely-coupled
> cluster/grid. There might be a remote-job-submission program
> already available; search google for a 'how-to' on the
> subject.
>
> Although I think their tools are true GRID/cluster related,
> you might be interested in this site
> http://www.cse.scitech.ac.uk/about_us/index.shtml.
>
> Also, I'm available to help as are others; but how much help
> were you looking for?
>
> James,
>
>
> ----- Original Message ----
> From: John McKelvey <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Saturday, January 5, 2008 10:14:33 PM
> Subject: [fwlug] Linux clustering
>
>
> Hello!
>
>
> I am a retired chemist and a user and abuser of computers
> [i.e. for fun I do computational chemistry, and keep a
> dual-dual 4-processor AMD box running RHEL4 cranking 24/7. I
> have an additional box that is a dual-core Xeon that I would
> like to cluster with the AMD box. I run only _extremely _
> coarse grained parallel codes, and identical executables
> running on any linux box... I run a fitting procedure that
> runs a particular executable on hundreds of examples, one at a
> time, collects results, adjusts parameters, and does it all
> again, over and over, till finished. There is no
> communication between nodes. Each node does a complete,
> seperate discreet task Node0 knows when a pass through the
> data has been completed, adjust parameters, and farms out
> jobs, over and over] .. but I'm not much of a systems
> person.. I have this running OK on the SMP box... just need to
> know how to farm out some of the work to the Xeon box. There
> is very little data moved around so standard old ethernet
> through my Verizon router should be fine. [4 machines are
> cabled in, plus a wireless machine.]
>
> I need a bit of help and advice. Is there someone available
> for helping me get this going?
>
> Many thanks!
>
> John McKelvey
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Fwlug mailing list
> [email protected]
> http://fortwaynelug.org/mailman/listinfo/fwlug_fortwaynelug.org
_______________________________________________
Fwlug mailing list
[email protected]
http://fortwaynelug.org/mailman/listinfo/fwlug_fortwaynelug.org