from:"Martin Feller"

Re: [gt-user] Loosing job states

2013-09-23 Thread Martin Feller

Quick note on the last paragraph: We don't use a SEG because it proved to be
unreliable.

On 24/09/13 8:26 AM, Markus Binsteiner wrote:

Hi Joe,

thanks for your thoughts on this, I'll try to get more info from the
logs. Although the problem now went away almost totally.

We had problems with LoadLeveler over the last few days, but once we
figured out what the problem was and worked around it those grid-related
issues went away too. It was just a bit strange because globus seemed to
have lost jobs at a much higher rate than our 'normal' LoadLeveler users
(of whom we have way more).

My current working theory is that globus tried to check the status more
often than a normal user would and therefor was much more likely to find
it in a broken state. Once that happened it considered the job as gone
and deleted those job files. Does that sound possible to you?

Best,
Markus

On Thu, 2013-09-19 at 11:51 -0400, Joseph Bester wrote:

That's normally not deleted until the job is completed and the two-phase commit is done.
The other reason why GRAM might delete it would be if the job expires (after it hits an
end state and hasn't been touched in 4 hours). Is there a possibility of something else
cleaning out that directory? Do those files exist?

It's possible to increase the logging level as described here:
http://www.globus.org/toolkit/docs/5.2/5.2.4/gram5/admin/#idp7912160 which
might give some info about what the job manager thinks is going on.

Joe

On Sep 18, 2013, at 3:33 PM, Markus Binsteiner m.binstei...@auckland.ac.nz
wrote:

Hi.

We are experiencing a mayor problems with loosing job states, after a
while (an hour or so) every job we submit via globus ends up in an
unknown state. I'm not quite sure where to start looking, the logs say:

ts=2013-09-18T19:20:31.006776Z id=14670 event=gram.state_file_read.end
level=ERROR gramid=/16361930530915519966/6437524403105335712/
path=/var/lib/globus/gram_job_state/mbin029/16966e4/loadleveler/job.16361930530915519966.6437524403105335712
msg=Error checking file status status=-121 errno=2 reason=No such file or
directory

everytime another status is lost. We are using jglobus (1.8.x),
two-phase commit and we poll the LRM (LoadLeveler -- not using scheduler
event generator).

Any idea what could cause those files to be deleted?

Best,
Markus

--
Martin Feller
Centre for eResearch, The University of Auckland
24 Symonds Street, Building 409, Room G21

e: m.fel...@auckland.ac.nz
p: +64 9 3737599 ext 82099

Re: [gt-user] problems with too many open files

2012-01-22 Thread Martin Feller


Hi Brian,

I think I'd try to convince the IT group to authorize the upgrade to GT 5.2.
According to 
http://www.globus.org/toolkit/docs/5.2/5.2.0/gram5/rn/#gram5-fixed
the issue with accumulating open files 
(http://jira.globus.org/browse/GRAM-223) was
fixed in the 5.2 series. We had the same problem with 5.0.4, and it 
works fine for us
with 5.2. Increasing the values will definitely help, but, depending on 
the activity of

users, may just delay the problem.

Martin

On 23/01/12 6:47 PM, Yuriy Halytskyy wrote:

Hi Brian,

Have a look at
http://technical.bestgrid.org/index.php/Setup_GRAM5_on_CentOS_5#Increase_Open_Files_Limit 



Cheers,
Yuriy

On 23/01/12 18:45, Brian O'Connor wrote:

Hi,

I've been using GRAM for a long time now and I'd like to push it into
production but I'm having issues with it.  I submit workflows of
hundreds of jobs each day through an automated submitter so I need to
be able to send jobs to a GRAM server and not have it get in a bad
state after x number of days.  That's the goal at least...

Anyway, the latest problem I've had is with GRAM rejecting incoming
requests because of Too many open files

Here's the error:

globus-job-run server.domain.name/jobmanager-sge /bin/hostname

GRAM Job submission failed because Error opening proxy file for
writing: 
/u/seqware/.globus/job/sqwprod.hpc.oicr.on.ca/16217884770066032596.5836665131371726474/x509_user_proxy:

Too many open files (24) (error code 75)

I checked my proxy and it looks OK:

grid-proxy-info
subject  : 
/O=Grid/OU=GlobusTest/OU=simpleCA-sqwstage.hpc.oicr.on.ca/OU=hpc.oicr.on.ca/CN=Seq

Ware/CN=1800547271
issuer   : 
/O=Grid/OU=GlobusTest/OU=simpleCA-sqwstage.hpc.oicr.on.ca/OU=hpc.oicr.on.ca/CN=Seq

Ware
identity : 
/O=Grid/OU=GlobusTest/OU=simpleCA-sqwstage.hpc.oicr.on.ca/OU=hpc.oicr.on.ca/CN=Seq

Ware
type : RFC 3820 compliant impersonation proxy
strength : 512 bits
path : /tmp/x509up_u1373
timeleft : 479:16:03  (20.0 days)

I then looked at the number of open files for this user:

  /usr/sbin/lsof  | grep seqware | wc -l
2084

Looking at the globus-job-manager it's using up the majority:

ps aux | grep globus-job-man
seqware   175028  0.0  0.0  61200   768 pts/2R+   00:21   0:00
grep globus-job-man
seqware  4103600  0.1  0.4 116984 18628 ?SJan22   1:26
globus-job-manager -conf
/usr/local/globus/default/etc/globus-job-manager.conf -type sge
seqware  4103647  0.0  0.1  36548  7440 ?SJan22   1:00
perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m
sge -c interactive
seqware  4103649  0.0  0.1  36548  7456 ?SJan22   0:59
perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m
sge -c interactive
seqware  4103650  0.0  0.1  36548  7440 ?SJan22   0:59
perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m
sge -c interactive
seqware  4103651  0.0  0.1  36548  7444 ?SJan22   0:59
perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m
sge -c interactive
seqware  4103652  0.0  0.1  36544  7436 ?SJan22   0:59
perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m
sge -c interactive

/usr/sbin/lsof  | grep seqware | grep 4103600 | wc -l
1069

However if I look at this users limits it looks like they can open up
to 32768 files and I can perform other file operations just fine.

  ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 69632
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 32768
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 69632
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Does anyone know why this is happening?  To date I've been killing the
globus-job-manager when things like this happen.  Is there a guide
somewhere that describes the right way to reset the daemons if
something goes wrong?  Is there a guide for avoiding common pitfalls
and setting up GRAM (in particular) to work in a heavily used grid
install?  I want to be able to push thousands of jobs through the
system but so far it seems to barf on me every few days which has
caused a lot of disruption in our workflows.

I'm currently using 5.0.2, I would like to upgrade but it requires the
IT group to authorize this.  Here's my configuration for the
gatekeeper:

service gsigatekeeper
{
 socket_type  = stream
 wait = no
 user = root
 server   = /usr/local/globus/default/sbin/globus-gatekeeper
 server_args  = -conf
/usr/local/globus/default/etc/globus-gatekeeper.conf

Re: [gt-user] gram5 job manager and open files with jglobus

2011-12-15 Thread Martin Feller

For the records: According to 
http://www.globus.org/toolkit/docs/5.2/5.2.0/gram5/rn/#gram5-fixed

it's a bug and fixed in 5.2...

On 10/12/11 12:12 PM, Martin Feller wrote:

Hi Globus team,

GT version: 5.0.4
jglobus version: 1.8.0

When I submit a job using jglobus' Gram API, the globus-job-manager 
processes seem to accumulate open files.
It looks like, for each job, the job manager opens files for the 
stdout and stderr params in the job description (or /dev/null if not 
specified) and doesn't close them, even if the job is done, until the 
job manager shuts down.
This becomes problematic if you have active users submitting a lot of 
jobs.


Any ideas what would cause the job manager to keep these files open?

NOTE: The job manager does not accumulate open files if I submit a job 
using the C client globusrun.


I attached output from lsof as example after submitting 5 sequential 
jobs. stdout and stderr params of the jobs were 
/home/testuser/[out|err]_UUID. The jobs were simple /bin/hostname 
jobs and done when I took the lsof snapshot.


Thanks!

Martin

[gt-user] gram5 job manager and open files with jglobus

2011-12-09 Thread Martin Feller


Hi Globus team,

GT version: 5.0.4
jglobus version: 1.8.0

When I submit a job using jglobus' Gram API, the globus-job-manager 
processes seem to accumulate open files.
It looks like, for each job, the job manager opens files for the stdout 
and stderr params in the job description (or /dev/null if not specified) 
and doesn't close them, even if the job is done, until the job manager 
shuts down.

This becomes problematic if you have active users submitting a lot of jobs.

Any ideas what would cause the job manager to keep these files open?

NOTE: The job manager does not accumulate open files if I submit a job 
using the C client globusrun.


I attached output from lsof as example after submitting 5 sequential 
jobs. stdout and stderr params of the jobs were 
/home/testuser/[out|err]_UUID. The jobs were simple /bin/hostname jobs 
and done when I took the lsof snapshot.


Thanks!

Martin


[root@er161-40 tmp]# lsof | grep testuser | grep home
globus-jo 23152  testuser  cwd   DIR  253,0 4096  76117 
/home/testuser
globus-jo 23153  testuser  cwd   DIR  253,0 4096  76117 
/home/testuser
globus-jo 23153  testuser6u unix 0x81000bc97980 5971386 
/home/testuser/.globus/job/er161-40.ceres.auckland.ac.nz/fork.6bd321d6.sock
globus-jo 23153  testuser8w  REG  253,0  1801895 846923 
/home/testuser/gram_20111209.log
globus-jo 23153  testuser   11uW REG  253,00 585356 
/home/testuser/.globus/job/er161-40.ceres.auckland.ac.nz/fork.6bd321d6.lock
globus-jo 23153  testuser   17w  REG  253,0   30 846966 
/home/testuser/out_10f827cf-4f1b-4422-b745-40e8b200d524
globus-jo 23153  testuser   18w  REG  253,00 846967 
/home/testuser/err_10f827cf-4f1b-4422-b745-40e8b200d524
globus-jo 23153  testuser   22w  REG  253,0   30 846968 
/home/testuser/out_2cac8de0-2f2e-47df-8cab-eb26c9406334
globus-jo 23153  testuser   23w  REG  253,00 846969 
/home/testuser/err_2cac8de0-2f2e-47df-8cab-eb26c9406334
globus-jo 23153  testuser   24w  REG  253,0   30 846971 
/home/testuser/out_11f03168-5616-45b9-bd53-6436bc00b75e
globus-jo 23153  testuser   25w  REG  253,00 846972 
/home/testuser/err_11f03168-5616-45b9-bd53-6436bc00b75e
globus-jo 23153  testuser   26w  REG  253,0   30 846973 
/home/testuser/out_abb4c0d8-f1ea-4e04-88a9-b2ba1e7f5b05
globus-jo 23153  testuser   27w  REG  253,00 846974 
/home/testuser/err_abb4c0d8-f1ea-4e04-88a9-b2ba1e7f5b05
globus-jo 23153  testuser   28w  REG  253,0   30 846976 
/home/testuser/out_a8d9598b-7f2d-41f5-9fbb-929b62e302f3
globus-jo 23153  testuser   29w  REG  253,00 846977 
/home/testuser/err_a8d9598b-7f2d-41f5-9fbb-929b62e302f3
perl  23155  testuser  cwd   DIR  253,0 4096  76117 
/home/testuser
perl  23155  testuser8w  REG  253,0  1801895 846923 
/home/testuser/gram_20111209.log
perl  23155  testuser   17w  REG  253,0   30 846966 
/home/testuser/out_10f827cf-4f1b-4422-b745-40e8b200d524
perl  23155  testuser   18w  REG  253,00 846967 
/home/testuser/err_10f827cf-4f1b-4422-b745-40e8b200d524

Re: [gt-user] problem in installing globus-4.2.1 -- the secondnode

2011-04-11 Thread Martin Feller

Is my_echo a shell script? If so, I remember an issue with shell scripts that 
lack the interpreting shell in the first line, like #!/bin/sh
If it's a shell script, make sure to add that line.

-Martin

On 4/11/11 1:21 PM, Christopher Kunz wrote:
 Am 11.04.2011 20:04, schrieb Prashanth Chengi:
 The good news is that the rsl file is ok. The bad news is that the problem 
 lies elsewhere.  Can you do a simple globus-url-copy successfully? Throw in 
 the
 -dbg flag too, to get additional info.
 Or check via non-grid means (SSH) on the other node if 
 ${GLOBUS_USER_HOME}/my_echo exists and is executable (mode 0755 or similar).
 
 --ck

Re: [gt-user] Globus installation error

2011-02-09 Thread Martin Feller

Is there a particular reason why you use such an old version of the GT (4.0.1)?
If you have to use a version from the 4.0 series I'd rather try 4.0.8.

-Martin

On 2/9/11 12:02 PM, kasim saeed wrote:
 Yes, i did check all the pre-requisites , I had just checked it again and g++ 
 is installed.
 Regards
 Kaasim Saeed.
 
 
 On Wed, Feb 9, 2011 at 10:58 PM, Roy, Kevin (LNG-SEA) 
 kevin@applieddiscovery.com mailto:kevin@applieddiscovery.com 
 wrote:
 
 Did you run through the quickstart and verify that you had all the needed 
 software? 
 
  
 
 I need to download g++ for my machines, it looks like that might be your 
 problem.
 
  
 
 *From:*gt-user-boun...@lists.globus.org 
 mailto:gt-user-boun...@lists.globus.org 
 [mailto:gt-user-boun...@lists.globus.org
 mailto:gt-user-boun...@lists.globus.org] *On Behalf Of *kasim saeed
 *Sent:* Wednesday, February 09, 2011 9:54 AM
 *To:* gt-user@lists.globus.org mailto:gt-user@lists.globus.org; Lukasz 
 Lacinski
 *Cc:* Dr. Farrukh Nadeem
 *Subject:* [gt-user] Globus installation error
 
  
 
 Hi all 
  I am new to Globus and need it to install for my academic purposes. I am 
 using  http://globus.org/toolkit/docs/4.0/admin/docbook/quickstart.html  
 link
 for installation.
 
 OS in Ubuntu 10.04
 Globus version is 4.0.1.
 
 
 All went well except when i gave the command make | tee installer.log 
 the following error appears.
 
 
 
 
 
 
 
 /usr/local/globus4a//sbin/gpt-build   
 -srcdir=source-trees-thr/core/source gcc32dbgpthr
 sh: NOT: not found
 /usr/local/globus4a//etc/gpt/globus_core-src.tar.gz could not be 
 untarred:512
 Died at /usr/local/globus4a//lib/perl/Grid/GPT/PkgMngmt/ExpandSource.pm 
 line 42.
 make: *** [globus_core-thr] Error 2
 
 
 
 Please help
 
 Regards
 Kaasim Saeed.

Re: [gt-user] Stripe mode over multiple links between two servers

2010-08-27 Thread Martin Feller

The CA itself should stay on one machine and should not be copied to
multiple nodes in a grid. It's probably only located on the first
machine in your case.
Does it work if you copy the host certificate request from the second
machine to the first machine, sign it there, and copy the generated
certificate back to the second machine, where the corresponding private
key of the host certificate lives?

Martin

Hoot Thompson wrote:
 I'm back again.  Can you point me to a good resource for setting up a
 simpleCA for two test machines.  Things go ok on the first machine but
 I'm getting stuck trying to sign the host certificate on the second
 machine.  I'm using the GT 5.0.2 SimpleCA: Admin Guide as a reference.
 
 Error message is as follows.
 
 [h...@i7test4 mailto:h...@i7test4 globus_simple_ca_264a619f_setup]$
 $GLOBUS_LOCATION/bin/grid-ca-sign -in
 /me/hoot/wideband_tools/gridftp/globus/etc/hostcert_request.pem -out
 $GLOBUS_LOCATION/hostsigned.pem
 
 ERROR: No simple CA directory
 found at /me/hoot/.globus/simpleCA/
 Either specify a directory with -dir, or run
 setup-simple-ca to create a CA
 
 
 
 -Original Message-
 *From*: Chandin Wilson chandin.wil...@noaa.gov
 mailto:chandin%20wilson%20%3cchandin.wil...@noaa.gov%3e
 *To*: h...@ptpnow.com mailto:h...@ptpnow.com
 *Cc*: gt-user@lists.globus.org mailto:gt-user@lists.globus.org
 *Subject*: Re: [gt-user] Stripe mode over multiple links between two servers
 *Date*: Tue, 24 Aug 2010 14:48:48 -0500 (CDT)
 
 From: Hoot Thompson h...@ptpnow.com mailto:h...@ptpnow.com
 Subject: RE: [gt-user] Stripe mode over multiple links between two servers
 Date: Tue, 24 Aug 2010 14:58:39 -0400
 
 Ok.  Just to repeat in my own words, two servers with two interfaces each
 can be striped if GSI is use.
 
 Yes. I'd expect you'd end up running three GridFTP instances per
 server, one master and two data movers, each bound to a seperate
 data interface.  
 
 Might want to make sure your filesystem and backend I/O can keep up
 with and sustain 20Gbit/sec.  
 
 --Chan
 
 
 
 Hoot 
 
 -Original Message-
 From: Chandin Wilson [mailto:chandin.wil...@noaa.gov] 
 Sent: Tuesday, August 24, 2010 2:48 PM
 To: h...@ptpnow.com mailto:h...@ptpnow.com
 Cc: gt-user@lists.globus.org mailto:gt-user@lists.globus.org
 Subject: Re: [gt-user] Stripe mode over multiple links between two servers
 
 From: Hoot Thompson h...@ptpnow.com mailto:h...@ptpnow.com
 Subject: [gt-user] Stripe mode over multiple links between two servers
 Date: Tue, 24 Aug 2010 14:03:39 -0400
 
 I have two servers, each with two 10GigE links and I would like to 
 stripe a file across the two links.  I'm currently authenticating 
 using ssh.  Can I do this using the gridftp server stripe mode and if so,
 how do I set it up?
 
 No, you cannot.  You must use GSI authentication (and hence gsiftp:// style
 URLs) to do striped (data movers) GridFTP transfers.
 
 --Chan
  Chandin Wilson, General Specialist, Information technology.
  chandin.wil...@noaa.gov mailto:chandin.wil...@noaa.gov
  +1-608-216-5689 
  OneNOAA  RDHPCS  Infrastructure  
 
 
 
 
 Thanks!

Re: [gt-user] Stop a nuclear disaster

2010-08-25 Thread Martin Feller

Tell him to upgrade to Gram5. Maybe that'll change his mind.

jayakan...@gmail.com wrote:
 Hi ,
 
 Our leaders are putting the nation at stake for the nuclear liability
 bill. The Standing Committee looking at the bill has submitted its
 recommendations to the Parliament.
 
 In its current form the bill limits the liability for operator of the
 nuclear facility in case of a nuclear accident. If the cost exceeds the
 limit we will have to pay for it. The Standing Committee has ignored the
 demand for unlimited liability which would have made the bill more
 competent.
 
 Our leaders have not learnt anything from the injustices of Bhopal.
 Prime Minister Manmohan Singh, eager to get this bill cleared needs to
 know that we want unlimited liability. I have already sent him an email
 asking him to incorporate unlimited liability in the bill. A large
 number of emails demanding the same will make it difficult for him to
 ignore us. We have very little time to make this change.
 
 Can you also write to PM Manmohan Singh asking him to incorporate
 unlimited liability?
 
 http://www.greenpeace.org/india/unlimited-liability
 
 Thanks!
 
 jayakan...@gmail.com
 
 You are receiving this email because someone you know sent it to you
 from the Greenpeace site. Greenpeace retains no information about
 individuals contacted through its site, and will not send you further
 messages without your consent -- although your friends could, of course,
 send you another message.

Re: [gt-user] globus-ws with lsf does not work

2010-08-05 Thread Martin Feller

Ok, that's odd. Right now I don't have an idea what might go wrong.
If you have full control over the GT server, and it's not a production
system, please do this:

0. Uncomment the following line in $GLOBUS_LOCATION/container-log4j.properties
   # log4j.category.org.globus=DEBUG

1. Shutdown the server
2. Remove the server logfile $GLOBUS_LOCATION/var/container.log
3. Remove the persistence directory 
~userWhoStartsTheContainer/.globus/persisted
4. Restart the GT server as a daemon (globus-start-container-detached)
5. Submit a simple batch job. No staging, no fileCleanUp please, just something
   simple like globusrun-ws -submit -c /bin/date
6. Save the server logfile $GLOBUS_LOCATION/var/container.log

Please do steps 1-6 for both a Fork and an LSF job, and send both log files.

Martin

Löhnhardt, Benjamin wrote:
 Hi Martin,
 
 Ah, why to the easy route if there is a complicated one...
 Somehow I was focused on your statement ... new LSF ... and thought
 it used
 to work with old LSF or Fork. So maybe this:
 
 It works fine with Fork. The old LSF is not installed anymore so I cannot
 test it. As both variants (Fork and LSF) use the same notification listener
 (I guess?), network configuration problems may not be the reason... 
 
 To verify that: submit a job in batch/non-interactive mode and store
 the EPR of the job. Then poll for status.
 
 With LSF the job status remains unsubmitted:
 
 -bash-3.1$ globusrun-ws -submit -b -o job.epr -F
 https://nimrod.med.uni-goettingen.de -Ft LSF -c /bin/date
 Submitting job...Done.
 Job ID: uuid:7e99a622-a061-11df-9d58-00215af48192
 Termination time: 08/06/2010 07:17 GMT
 -bash-3.1$ globusrun-ws -status -j job.epr
 Current job state: Unsubmitted
 
 ...but with Fork it is done.
 
 -bash-3.1$ globusrun-ws -submit -b -o job.epr -F
 https://nimrod.med.uni-goettingen.de -Ft Fork -c /bin/date
 Submitting job...Done.
 Job ID: uuid:8b85d1b2-a061-11df-9fb3-00215af48192
 Termination time: 08/06/2010 07:17 GMT
 -bash-3.1$ globusrun-ws -status -j job.epr
 Current job state: Done
 
 Do you have an explanation for that strange behavior?
 
 Regards,
 Benjamin
 
 --
 
 Benjamin Löhnhardt
 
 UNIVERSITÄTSMEDIZIN GÖTTINGEN
 GEORG-AUGUST-UNIVERSITÄT 
 Abteilung Medizinische Informatik
 Robert-Koch-Straße 40
 37075 Göttingen
 Briefpost 37099 Göttingen
 Telefon +49-551 / 39-22842
 benjamin.loehnha...@med.uni-goettingen.de
 www.mi.med.uni-goettingen.de

Re: [gt-user] globus-ws with lsf does not work

2010-08-05 Thread Martin Feller

Hm, it's very strange. In the logfile for the LSF job I can see that
no single message of the LSF SEG ever enters the Java code, which explains
what we see, but I don't know why this happens.
It's just like either the LSF SEG died (we should see an error in the log in
that situation, though, but there is no error), or the thread that runs the
java code that communicates with the SEG (which is named SchedulerEventGenerator
too) died (we should see an error in the log too, but there is nothing)

(You can turn ws-gram debugging in the server off again)

Please start the server and submit an LSF job.

Then please paste information about the processes of the GT server and the SEGs
(ps -ef | grep -i globus | grep -v grep should give you that)

Please send me a thread dump of the GT server process.
(kill -QUIT server-pid. The output is stored in the server logfile)

Please send the output of ldd 
$GLOBUS_LOCATION/libexec/globus-scheduler-event-generator

I hope this will tell me more.

Just to make sure:
When you started the SEG manually and saw it printing output, you used the SEG 
from the
same $GLOBUS_LOCATION that is used by the GT4 server we talk about, right?
There is not accidentally another GLOBUS_LOCATION around that might cause some 
confusion?
I remember vaguely that there once was a situation with 2 globus installations 
where the SEG
didn't report anything but I don't remember any details...
Maybe worth trying a clean re-install. Should be relatively quick to do with a 
binary installer.

Thanks,

Martin

Löhnhardt, Benjamin wrote:
 Hi Martin,
 
 I think I should not send 5MB to the mailing list. So just for you the two
 resulting container.log.
 
 Regards,
 Benjamin
 
 --
 
 Benjamin Löhnhardt
 
 UNIVERSITÄTSMEDIZIN GÖTTINGEN
 GEORG-AUGUST-UNIVERSITÄT 
 Abteilung Medizinische Informatik
 Robert-Koch-Straße 40
 37075 Göttingen
 Briefpost 37099 Göttingen
 Telefon +49-551 / 39-22842
 benjamin.loehnha...@med.uni-goettingen.de
 www.mi.med.uni-goettingen.de
 
 
 -Ursprüngliche Nachricht-
 Von: Martin Feller [mailto:fel...@mcs.anl.gov]
 Gesendet: Donnerstag, 5. August 2010 14:39
 An: Löhnhardt, Benjamin
 Cc: gt-u...@globus.org
 Betreff: Re: AW: [gt-user] globus-ws with lsf does not work

 Ok, that's odd. Right now I don't have an idea what might go wrong.
 If you have full control over the GT server, and it's not a production
 system, please do this:

 0. Uncomment the following line in $GLOBUS_LOCATION/container-
 log4j.properties
# log4j.category.org.globus=DEBUG

 1. Shutdown the server
 2. Remove the server logfile $GLOBUS_LOCATION/var/container.log
 3. Remove the persistence directory
 ~userWhoStartsTheContainer/.globus/persisted
 4. Restart the GT server as a daemon (globus-start-container-detached)
 5. Submit a simple batch job. No staging, no fileCleanUp please, just
 something
simple like globusrun-ws -submit -c /bin/date
 6. Save the server logfile $GLOBUS_LOCATION/var/container.log

 Please do steps 1-6 for both a Fork and an LSF job, and send both log
 files.

 Martin

 Löhnhardt, Benjamin wrote:
 Hi Martin,

 Ah, why to the easy route if there is a complicated one...
 Somehow I was focused on your statement ... new LSF ... and
 thought
 it used
 to work with old LSF or Fork. So maybe this:
 It works fine with Fork. The old LSF is not installed anymore so I
 cannot
 test it. As both variants (Fork and LSF) use the same notification
 listener
 (I guess?), network configuration problems may not be the reason...

 To verify that: submit a job in batch/non-interactive mode and
 store
 the EPR of the job. Then poll for status.
 With LSF the job status remains unsubmitted:

 -bash-3.1$ globusrun-ws -submit -b -o job.epr -F
 https://nimrod.med.uni-goettingen.de -Ft LSF -c /bin/date
 Submitting job...Done.
 Job ID: uuid:7e99a622-a061-11df-9d58-00215af48192
 Termination time: 08/06/2010 07:17 GMT
 -bash-3.1$ globusrun-ws -status -j job.epr
 Current job state: Unsubmitted

 ...but with Fork it is done.

 -bash-3.1$ globusrun-ws -submit -b -o job.epr -F
 https://nimrod.med.uni-goettingen.de -Ft Fork -c /bin/date
 Submitting job...Done.
 Job ID: uuid:8b85d1b2-a061-11df-9fb3-00215af48192
 Termination time: 08/06/2010 07:17 GMT
 -bash-3.1$ globusrun-ws -status -j job.epr
 Current job state: Done

 Do you have an explanation for that strange behavior?

 Regards,
 Benjamin

 --

 Benjamin Löhnhardt

 UNIVERSITÄTSMEDIZIN GÖTTINGEN
 GEORG-AUGUST-UNIVERSITÄT
 Abteilung Medizinische Informatik
 Robert-Koch-Straße 40
 37075 Göttingen
 Briefpost 37099 Göttingen
 Telefon +49-551 / 39-22842
 benjamin.loehnha...@med.uni-goettingen.de
 www.mi.med.uni-goettingen.de

Re: [gt-user] globus-ws with lsf does not work

2010-08-04 Thread Martin Feller

Löhnhardt, Benjamin wrote:
 Hi Martin,
 
 thanks for your prompt response! I suppose, that you are right with your
 guess. Maybe the SEG does not work correctly...
 
 In the meanwhile I have tested the following:
 After submitting a job via globus-ws, the job will be executed by lsf.
 Afterwards there is an entry in the logfile of lsf
 (/opt/hptc/lsf/top/work/hptclsf/logdir/lsb.acct). This entry seems to be
 equal (or similar) to the old lsf logfile. Mainly the version number of lsf
 is different.
 
 In /opt/globus/gt4/etc/globus-lsf.conf the right location of the lsf logfile
 is given: log_path=/opt/hptc/lsf/top/work/hptclsf/logdir.

Good point. If the log_path pointed to a wrong location, that would have been
a good explanation...

 
 How can I run the SEG manually? I have tried
 /opt/globus/gt4/libexec/globus-scheduler-event-generator -s LSF, but
 without success and with an error message: globus_scheduler_event_generator:
 Unable to dlopen module /opt/globus/gt4/lib/libglobus_seg_LSF_gcc64dbg.la:
 file not found

2 things you can check:

1. Make sure /opt/globus/gt4/lib is set in your library search path environment
   variable (The name of this variable depends on the system, but in most cases
   it's $LD_LIBRARY_PATH)
2. I think you need to type LSF lower-case. Try
   /opt/globus/gt4/libexec/globus-scheduler-event-generator -s lsf
   (The library is very probably named libglobus_seg_lsf_gcc64dbg.la and not
libglobus_seg_LSF_gcc64dbg.la)

You might see a lot of output when you start that command. Wait until the
output stops. Then submit a job to LSF via WS-GRAM.
You should then see something like

001;1280934384;d96c164e-9fd9-11df-a8d3-0013d4c3b957:26714;2;0
001;1280934384;d96c164e-9fd9-11df-a8d3-0013d4c3b957:26714;8;0

on the console when you submit a job to LSF vi WS-GRAM (provided the SEG works 
properly).
If the SEG doesn't spit out anything and the job is done in LSF, then
something is wrong.
In that case, submit a job to the scheduler 'fork' and see if the fork SEG
works ok, just to make sure you did the right things with the LSF SEG.
(Start the fork seg by 'globus-scheduler-event-generator -s fork' and submit
a 'fork' job via WS-GRAM)

Martin

 
 Btw: We use Globus 4.0.8 (I did not mentioned it in the last post.
 
 Regards,
 Benjamin
 
 
 -Ursprüngliche Nachricht-
 Von: Martin Feller [mailto:fel...@mcs.anl.gov]
 Gesendet: Freitag, 30. Juli 2010 14:50
 An: Löhnhardt, Benjamin
 Cc: gt-u...@globus.org
 Betreff: Re: [gt-user] globus-ws with lsf does not work

 Hi,

 Just an educated guess:
 I assume the problem is the scheduler event generator (SEG).
 In Gram4, and I think also in Gram5, the SEG is responsible for
 telling Gram about the status of the jobs in the job manager.
 If the SEG doesn't tell Gram about job status, the job doesn't make
 any progress from Gram's perspective.

 I think the SEG works on the log files of the job managers to get the
 status information about the jobs.
 If something changed in the logging format of the job manager, the SEG
 may not be able to get the information anymore.
 To confirm this I would probably run the SEG by hand and submit a
 Gram job to old/new LSF and check if the SEG actually spits out
 information on job status as it is processed in the job manager.

 Martin

 Löhnhardt, Benjamin wrote:
 Hello,

 we have a problem with globus and LSF as job manager. When we execute
 a
 globus job via globus-ws on a client, on the server the job will be
 handled
 by the lsf job manager normally. The job even will be executed by
 lsf
 itself, so that the test script was run on the server.  However, the
 client
 does not notice that the script was run successfully and waits. The
 output on
 the client:

 -bash-3.1$ globusrun-ws -submit -s -F https://nimrod.med.uni-
 goettingen.de
 -Ft LSF -c /tmp/test.sh
 Delegating user credentials...Done.
 Submitting job...Done.
 Job ID: uuid:c06227cc-9bc6-11df-80d4-00215af48192
 Termination time: 07/31/2010 10:39 GMT

 We have updated the lsf system on the server from 6.2 to 7.0. Has
 anybody a
 hint why the client is waiting of a response by the server? How can
 we fix
 this issue?

 Regards,
 Benjamin

Re: [gt-user] globus-ws with lsf does not work

2010-08-04 Thread Martin Feller

Martin Feller wrote:
 Löhnhardt, Benjamin wrote:
 1. Make sure /opt/globus/gt4/lib is set in your library search path
 environment
 /opt/globus/gt4/lib is set in $LD_LIBRARY_PATH.

 2. I think you need to type LSF lower-case. Try
 Lower case works :-)

 After submitting a job the output is:
 001;1280936535;25852;1;0
 001;1280936538;25852;2;0
 001;1280936570;25852;8;0

 
 Ok, this looks good.
 
 Have you got any ideas why the messages does not reach the client? Can
 network configuration problems (firewall) cause it. Do you know which port(s)
 are used for the event to the client?

 
 Ah, why to the easy route if there is a complicated one...
 Somehow I was focused on your statement ... new LSF ... and thought it used
 to work with old LSF or Fork. So maybe this:
 
 If your client is behind a firewall, then it's a pretty common thing that
 notification messages are blocked. globusrun-ws, under the hood, starts a 
 notification
 listener, which WS-GRAM uses to send the notification messages to.
 If messages to the port of that listener are blocked for some reason,
 globusrun-ws doesn't get any job status information from the server.
 
 To verify that: submit a job in batch/non-interactive mode and store the EPR
 of the job. Then poll for status. Like
 
 globusrun-ws -submit -b -o job.epr ...
 globusrun-ws -status -f job.epr

globusrun-ws -status -j job.epr is correct (-j, not -f)

 
 If you are able to get the job status this way, then the notification messages
 keep stuck somewhere between the server and the client.
 
 Martin
 
 Regards,
 Benjamin

 --

 Benjamin Löhnhardt

 UNIVERSITÄTSMEDIZIN GÖTTINGEN
 GEORG-AUGUST-UNIVERSITÄT 
 Abteilung Medizinische Informatik
 Robert-Koch-Straße 40
 37075 Göttingen
 Briefpost 37099 Göttingen
 Telefon +49-551 / 39-22842
 benjamin.loehnha...@med.uni-goettingen.de
 www.mi.med.uni-goettingen.de

Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate.

2010-07-21 Thread Martin Feller

Can you paste the exact commands you use in the system calls, and the
error you get in the concurrent scenario?

Martin

Belaid MOA wrote:
 That's right Martin. For each thread, I just call
 system(globus-credential-delegate ... ) and use the epr in
 system(globusrun-ws ). That's where I do not get any error. 
 If, however,
 I call system(globusrun-ws ...) on each thread using a single epr
 (created in the shell script before running the C program), then I
 started getting RSL stagein element error.
 
 Thanks a lot Martin for looking at this.
 
 ~Belaid.
  
 
 Date: Wed, 21 Jul 2010 09:55:16 -0500
 From: fel...@mcs.anl.gov
 CC: gt-user@lists.globus.org
 Subject: Re: [gt-user] Threads and
 globusrun-ws/globus-credential-delegate.

 Hi,

 I'm not sure I get this question right, and I'm also not a C guy anymore.

 Does it work if you run globus-credential-delegate and globusrun-ws
 sequentially
 as command-line tools?

 I.e.
 1. Call globus-credential-delegate and store the EPR somewhere.
 2. Then use it for several globusrun-ws job submissions.

 Martin

 Belaid MOA wrote:
  Hi everyone,
  Just a quick question, I am using pthreads in C to run globusrun-ws
  and globus-credential-delegate concurrently on a GT4 PBS cluster.
  I noticed that using a single system call to globus-credential-delegate
  when submitting a set of jobs produces RSL stagein element error (The
  jobs are using
  the same epr produced by the single call to globus-credential-delegate).
  This does not happen when globus-credential-delegate is called for every
  job (each job has its own unique epr).
 
  Is that mean that globusrun-ws/globus-credential-delegate are not
  thread-safe?
 
  Thanks a lot in advance.
  ~Belaid.
 
  
  Turn down-time into play-time with Messenger games Play Now!
  http://go.microsoft.com/?linkid=9734381

 
 
 Look 'em in the eye: FREE Messenger video chat Chat Now!
 http://go.microsoft.com/?linkid=9734382

Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate.

2010-07-21 Thread Martin Feller

I'm sorry, I might be a bit dense but it's still not entirely
clear to me: if you run the following in a plain shell script:

globus-credential-delegate -h scheduler eprFileName
globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName 
-o JobIdFile -f jobDescFile
globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName 
-o JobIdFile -f jobDescFile
globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName 
-o JobIdFile -f jobDescFile

Do the jobs succeed or fail?

Martin


Belaid MOA wrote:
 Thanks a lot Martin for looking at this.
 
 1- In the shell script, I run:
 
 globus-credential-delegate -h $scheduler $eprName
 
 2- The command I call in each thread is:
  string sysCommand = globusrun-ws -submit -batch -F +scheduler+
 -Ft +factory+ -S -Jf ;
  sysCommand.append(eprName);
  sysCommand.append( -o JobIdFile);
  sysCommand.append( -f  );
  sysCommand.append(jobDescFile);
  //submit the request
  system(command.c_str());
 
 3- The error is:
 
 $ globusrun-ws -status -j
 JobId: 94110cc2-9376-11df-9044-0019d1a
 Current job state: Failed
 globusrun-ws: Job failed: Staging error for RSL element fileStageIn.
 Connection creation error [Caused by: java.io.EOFException]
 Connection creation error [Caused by: java.io.EOFException]
 
 I do not have access to the GT4 log container on the PBS head node :(.
 
 ~Belaid.
 
 Date: Wed, 21 Jul 2010 10:40:26 -0500
 From: fel...@mcs.anl.gov
 CC: gt-user@lists.globus.org
 Subject: Re: [gt-user] Threads and
 globusrun-ws/globus-credential-delegate.

 Can you paste the exact commands you use in the system calls, and the
 error you get in the concurrent scenario?

 Martin

 Belaid MOA wrote:
  That's right Martin. For each thread, I just call
  system(globus-credential-delegate ... ) and use the epr in
  system(globusrun-ws ). That's where I do not get any error.
  If, however,
  I call system(globusrun-ws ...) on each thread using a single epr
  (created in the shell script before running the C program), then I
  started getting RSL stagein element error.
 
  Thanks a lot Martin for looking at this.
 
  ~Belaid.
 
 
  Date: Wed, 21 Jul 2010 09:55:16 -0500
  From: fel...@mcs.anl.gov
  CC: gt-user@lists.globus.org
  Subject: Re: [gt-user] Threads and
  globusrun-ws/globus-credential-delegate.
 
  Hi,
 
  I'm not sure I get this question right, and I'm also not a C guy
 anymore.
 
  Does it work if you run globus-credential-delegate and globusrun-ws
  sequentially
  as command-line tools?
 
  I.e.
  1. Call globus-credential-delegate and store the EPR somewhere.
  2. Then use it for several globusrun-ws job submissions.
 
  Martin
 
  Belaid MOA wrote:
   Hi everyone,
   Just a quick question, I am using pthreads in C to run globusrun-ws
   and globus-credential-delegate concurrently on a GT4 PBS cluster.
   I noticed that using a single system call to
 globus-credential-delegate
   when submitting a set of jobs produces RSL stagein element error (The
   jobs are using
   the same epr produced by the single call to
 globus-credential-delegate).
   This does not happen when globus-credential-delegate is called
 for every
   job (each job has its own unique epr).
  
   Is that mean that globusrun-ws/globus-credential-delegate are not
   thread-safe?
  
   Thanks a lot in advance.
   ~Belaid.
  
  
 
   Turn down-time into play-time with Messenger games Play Now!
   http://go.microsoft.com/?linkid=9734381
 
 
  
  Look 'em in the eye: FREE Messenger video chat Chat Now!
  http://go.microsoft.com/?linkid=9734382

 
 
 Look 'em in the eye: FREE Messenger video chat Chat Now!
 http://go.microsoft.com/?linkid=9734382

Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate.

2010-07-21 Thread Martin Feller

You must not add  at the end of the globus-credential-delegate
command, because the job submission commands require the delegation
command to have finished. Otherwise there won't be an epr of a delegated
credential.

Ok, I think what I get out of this is: It works sequentially (that is what
I wanted to confirm, so the usage of the commands is ok), but maybe not
using pthreads. I don't know what the problem might be.

Maybe Joe Bester who wrote the command-line tools can provide more input
on this.

Martin

Belaid MOA wrote:
 In the plain shell script as is, no error is thrown. But when we add 
 at the end of each line, we get the error similar to the one we got from
 using pthreads.
 
 ~Belaid.
 
 Date: Wed, 21 Jul 2010 14:48:41 -0500
 From: fel...@mcs.anl.gov
 To: belaid_...@hotmail.com
 CC: gt-user@lists.globus.org
 Subject: Re: [gt-user] Threads and
 globusrun-ws/globus-credential-delegate.

 I'm sorry, I might be a bit dense but it's still not entirely
 clear to me: if you run the following in a plain shell script:

 globus-credential-delegate -h scheduler eprFileName
 globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf
 eprFileName -o JobIdFile -f jobDescFile
 globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf
 eprFileName -o JobIdFile -f jobDescFile
 globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf
 eprFileName -o JobIdFile -f jobDescFile

 Do the jobs succeed or fail?

 Martin


 Belaid MOA wrote:
  Thanks a lot Martin for looking at this.
 
  1- In the shell script, I run:
 
  globus-credential-delegate -h $scheduler $eprName
 
  2- The command I call in each thread is:
  string sysCommand = globusrun-ws -submit -batch -F +scheduler+
  -Ft +factory+ -S -Jf ;
  sysCommand.append(eprName);
  sysCommand.append( -o JobIdFile);
  sysCommand.append( -f );
  sysCommand.append(jobDescFile);
  //submit the request
  system(command.c_str());
 
  3- The error is:
 
  $ globusrun-ws -status -j
  JobId: 94110cc2-9376-11df-9044-0019d1a
  Current job state: Failed
  globusrun-ws: Job failed: Staging error for RSL element fileStageIn.
  Connection creation error [Caused by: java.io.EOFException]
  Connection creation error [Caused by: java.io.EOFException]
 
  I do not have access to the GT4 log container on the PBS head node :(.
 
  ~Belaid.
 
  Date: Wed, 21 Jul 2010 10:40:26 -0500
  From: fel...@mcs.anl.gov
  CC: gt-user@lists.globus.org
  Subject: Re: [gt-user] Threads and
  globusrun-ws/globus-credential-delegate.
 
  Can you paste the exact commands you use in the system calls, and the
  error you get in the concurrent scenario?
 
  Martin
 
  Belaid MOA wrote:
   That's right Martin. For each thread, I just call
   system(globus-credential-delegate ... ) and use the epr in
   system(globusrun-ws ). That's where I do not get any error.
   If, however,
   I call system(globusrun-ws ...) on each thread using a single epr
   (created in the shell script before running the C program), then I
   started getting RSL stagein element error.
  
   Thanks a lot Martin for looking at this.
  
   ~Belaid.
  
  
   Date: Wed, 21 Jul 2010 09:55:16 -0500
   From: fel...@mcs.anl.gov
   CC: gt-user@lists.globus.org
   Subject: Re: [gt-user] Threads and
   globusrun-ws/globus-credential-delegate.
  
   Hi,
  
   I'm not sure I get this question right, and I'm also not a C guy
  anymore.
  
   Does it work if you run globus-credential-delegate and globusrun-ws
   sequentially
   as command-line tools?
  
   I.e.
   1. Call globus-credential-delegate and store the EPR somewhere.
   2. Then use it for several globusrun-ws job submissions.
  
   Martin
  
   Belaid MOA wrote:
Hi everyone,
Just a quick question, I am using pthreads in C to run
 globusrun-ws
and globus-credential-delegate concurrently on a GT4 PBS cluster.
I noticed that using a single system call to
  globus-credential-delegate
when submitting a set of jobs produces RSL stagein element
 error (The
jobs are using
the same epr produced by the single call to
  globus-credential-delegate).
This does not happen when globus-credential-delegate is called
  for every
job (each job has its own unique epr).
   
Is that mean that globusrun-ws/globus-credential-delegate are not
thread-safe?
   
Thanks a lot in advance.
~Belaid.
   
   
  
Turn down-time into play-time with Messenger games Play Now!
http://go.microsoft.com/?linkid=9734381
  
  
  
 
   Look 'em in the eye: FREE Messenger video chat Chat Now!
   http://go.microsoft.com/?linkid=9734382
 
 
  
  Look 'em in the eye: FREE Messenger video chat Chat Now!
  http://go.microsoft.com/?linkid=9734382

 
 
 Turn down-time into play-time with

Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate.

2010-07-21 Thread Martin Feller

I'll have to test the situation when the jobs are submitted in the background.
Don't have a GT available at the moment, so it might take a while.
What is the GT server and client version?
Also: You said that the error when running the commands in a plain script
is 'similar'. Can you paste it?

Martin

Belaid MOA wrote:
 Thanks a lot Lukasz. I completely agree. I was talking, however, at the
 service level not at the client side.
 Since using fork (with ) and pthread generates the same error, there is
 some how a problem when
 a single credential epr is shared between jobs simultaneously.
 
 ~Belaid.
 
 
 
 
 Date: Wed, 21 Jul 2010 15:44:52 +0200
 From: luk...@ci.uchicago.edu
 To: belaid_...@hotmail.com
 CC: gt-user@lists.globus.org
 Subject: Re: [gt-user] Threads and
 globusrun-ws/globus-credential-delegate.

 Hi Belaid,

 Thread safety has nothing to do with this. You creates completely
 separated processes. Threads from one process do not interact with
 threads from another process. Every process uses a separated memory
 space allocated by kernel for a process.

 (Processes can use shared variables if a method of inter-process
 communication by shared memory is implemented what, I am sure, is not a
 case here).

 Regards,
 Lukasz

 Belaid MOA wrote:
  Indeed, the delegation is done first (without ) and then the set of
  globusrun-ws are with .
 
  The following sentence from
  http://www.globus.org/toolkit/releasenotes/4.0.4/ may explain why:
  The service engine and clients are not thread-safe
 
  Is this means that any client call is not thread-safe?
 
  ~Belaid.
 
   Date: Wed, 21 Jul 2010 15:16:59 -0500
   From: fel...@mcs.anl.gov
   CC: gt-user@lists.globus.org
   Subject: Re: [gt-user] Threads and
  globusrun-ws/globus-credential-delegate.
  
   You must not add  at the end of the globus-credential-delegate
   command, because the job submission commands require the delegation
   command to have finished. Otherwise there won't be an epr of a
 delegated
   credential.
  
   Ok, I think what I get out of this is: It works sequentially (that
  is what
   I wanted to confirm, so the usage of the commands is ok), but
 maybe not
   using pthreads. I don't know what the problem might be.
  
   Maybe Joe Bester who wrote the command-line tools can provide more
 input
   on this.
  
   Martin
  
   Belaid MOA wrote:
In the plain shell script as is, no error is thrown. But when we
 add 
at the end of each line, we get the error similar to the one we
  got from
using pthreads.
   
~Belaid.
   
Date: Wed, 21 Jul 2010 14:48:41 -0500
From: fel...@mcs.anl.gov
To: belaid_...@hotmail.com
CC: gt-user@lists.globus.org
Subject: Re: [gt-user] Threads and
globusrun-ws/globus-credential-delegate.
   
I'm sorry, I might be a bit dense but it's still not entirely
clear to me: if you run the following in a plain shell script:
   
globus-credential-delegate -h scheduler eprFileName
globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf
eprFileName -o JobIdFile -f jobDescFile
globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf
eprFileName -o JobIdFile -f jobDescFile
globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf
eprFileName -o JobIdFile -f jobDescFile
   
Do the jobs succeed or fail?
   
Martin
   
   
Belaid MOA wrote:
 Thanks a lot Martin for looking at this.

 1- In the shell script, I run:

 globus-credential-delegate -h $scheduler $eprName

 2- The command I call in each thread is:
 string sysCommand = globusrun-ws -submit -batch -F +scheduler+
 -Ft +factory+ -S -Jf ;
 sysCommand.append(eprName);
 sysCommand.append( -o JobIdFile);
 sysCommand.append( -f );
 sysCommand.append(jobDescFile);
 //submit the request
 system(command.c_str());

 3- The error is:

 $ globusrun-ws -status -j
 JobId: 94110cc2-9376-11df-9044-0019d1a
 Current job state: Failed
 globusrun-ws: Job failed: Staging error for RSL element
  fileStageIn.
 Connection creation error [Caused by: java.io.EOFException]
 Connection creation error [Caused by: java.io.EOFException]

 I do not have access to the GT4 log container on the PBS head
  node :(.

 ~Belaid.

 Date: Wed, 21 Jul 2010 10:40:26 -0500
 From: fel...@mcs.anl.gov
 CC: gt-user@lists.globus.org
 Subject: Re: [gt-user] Threads and
 globusrun-ws/globus-credential-delegate.

 Can you paste the exact commands you use in the system calls,
  and the
 error you get in the concurrent scenario?

 Martin

 Belaid MOA wrote:
  That's right Martin. For each thread, I just call
  system(globus-credential-delegate ... ) and use the epr in
  system(globusrun-ws ). That's where I do not get any
  error.
  If, however,
  I call system(globusrun-ws ...) on each thread using a
  single epr
  (created in the shell

Re: [gt-user] Installation problem in globus toolkit 5.0.1

2010-07-05 Thread Martin Feller

globus 5.x doesn't have web services support.

naveen wrote:
 I am installing globus toolkit 5.0.1 on ubuntu 9. It is installed on my system
 but i don't know how to install web service container on globus and how to 
 start
 web services. Because there is no help or guidance provided in quick start 
 file
 of globus toolkit 5.0.1 as it is provided in quick start of toolkit 4.0.*. I
 need WSRF for weka4ws. So please help me to install and use web services on
 globus toolkit 5.0.1.

Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile

2010-06-23 Thread Martin Feller

Hi Marco,

All very strange...
If you were not in the grid-mapfile you wouldn't get that far in the job 
submission.

I see this:

2010-06-23 10:32:32,817 DEBUG authorization.GridMapAuthorization 
[ServiceThread-73,isPermitted:181] Peer /O=KGrid/CN=Marco Lackovic authorized 
as lackovic
based on gridmap file /etc/grid-security/grid-mapfile
2010-06-23 10:32:32,831 DEBUG factory.ManagedJobFactoryService 
[ServiceThread-73,createManagedJob:96] Entering createManagedJob()

so the authorization check prior to the service call indicates you are mapped
in /etc/grid-security/grid-mapfile. But later on, in the submission phase of 
the job:

2010-06-23 10:32:33,640 DEBUG exec.StateMachine 
[RunQueueThread_2,runScript:2898] running script submit
2010-06-23 10:32:33,640 DEBUG exec.JobManagerScript [RunQueueThread_2,run:199] 
Executing command:
/usr/bin/sudo -H -u lackovic -S 
/usr/local/globus-4.0.8/libexec/globus-gridmap-and-execute -g 
/etc/grid-security/grid-mapfile
/usr/local/globus-4.0.8/libexec/globus-job-manager-script.pl -m fork -f 
/usr/local/globus-4.0.8/tmp/gram_job_mgr2078824211274262568.tmp -c submit
2010-06-23 10:32:33,668 DEBUG exec.JobManagerScript [RunQueueThread_2,run:218] 
first line: null
2010-06-23 10:32:33,670 DEBUG exec.JobManagerScript [RunQueueThread_2,run:328] 
failure message: Script stderr:
lackovic is not in the grid mapfile

(uses the same grid-mapfile)

Can you please send me your entire grid-mapfile (maybe not to the list)?
I want to check if I can replicate something like that.

Martin

Marco Lackovic wrote:
 Hi Martin,
 
 On Tue, Jun 22, 2010 at 8:11 PM, Martin Feller fel...@mcs.anl.gov wrote:
 I would really like to see the entire log of a job, ideally in
 a format that is a bit easier to digest.
 
 Yes, you are right, sorry for that. I wasn't sure I could send
 attachments to the mailing-list. You can find the log attached to this
 message.

Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile

2010-06-23 Thread Martin Feller

Marco Lackovic wrote:
 On Wed, Jun 23, 2010 at 2:12 PM, Martin Feller fel...@mcs.anl.gov wrote:
 You seem to run the GT server as another user than lackovic, e.g. as user 
 globus.
 If so, and if the grid-mapfile is readable for the user globus but not to 
 user lackovic,
 then you would run into this situation:
 The first check for a mapping is done by the user who runs the server 
 (globus in this example).
 globus has read privs and things are ok, because lackovic is mapped the 
 grid-mapfile

 Later, the job is submitted as user lackovic (sudo) though, and if lackovic 
 does not
 have permissions to read the grid-mapfile, then we get this error.
 
 That was it! Really well thought! Excellent! Thank you very much.
 
 
 I guess all users must have read permissions on the grid-mapfile.
 
 I solved by adding the user to the globus group.
 Shouldn't it be mentioned somewhere in the guide?
 

Glad it's working!
Yeah, it should be in the docs.
I wonder why we never ran into this issue more often.

Martin

Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile

2010-06-22 Thread Martin Feller

Marco Lackovic wrote:
 Hi Martin,
 
 On Mon, Jun 21, 2010 at 2:30 AM, Martin Feller fel...@mcs.anl.gov wrote:
 How do

  $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml,
  
 $GLOBUS_LOCATION/etc/globus_wsrf_gram/managed-job-factory-security-config.xml,
  $GLOBUS_LOCATION/etc/globus_wsrf_gram/managed-job-security-config.xml

 look like?
 
 
 - global_security_descriptor.xml:
 
 ?xml version=1.0 encoding=UTF-8?
 securityConfig xmlns=http://www.globus.org;
 credential
 key-file value=/etc/grid-security/containerkey.pem/
 cert-file value=/etc/grid-security/containercert.pem/
 /credential
 gridmap value=/etc/grid-security/grid-mapfile/
 /securityConfig
 
 
 The directory $GLOBUS_LOCATION/etc/globus_wsrf_gram does not exist. I
 guess that must have been the problem. I don't understand how could
 that happen: I have built GT from source and the build completed
 successfully.

So the mapping of DNs to local accounts works now or does it still fail?

 
 
 Do you have the env var GRIDMAP set in the environment of the user who runs
 the GT4 server?
 
 No, it is not set.

Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile

2010-06-22 Thread Martin Feller

Marco Lackovic wrote:
 On Tue, Jun 22, 2010 at 1:23 PM, Martin Feller fel...@mcs.anl.gov wrote:
 The directory $GLOBUS_LOCATION/etc/globus_wsrf_gram does not exist. I
 guess that must have been the problem. I don't understand how could
 that happen: I have built GT from source and the build completed
 successfully.
 So the mapping of DNs to local accounts works now or does it still fail?
 
 Still fails because the directory etc/globus_wsrf_gram is still
 missing. I don't know how to fix it.
 
 

How do you install it (4.0.8, right)?

Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile

2010-06-22 Thread Martin Feller

Sorry, my bad. In 4.0.x the directory names are different compared to 4.2.x:

I meant the following directories:

$GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml,
$GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml,
$GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml

Um, but I see that you didn't run 'make install' after configure and make,
which is a required step.
What if you run 'make install' at the end and try it then?

Martin

Marco Lackovic wrote:
 On Tue, Jun 22, 2010 at 2:36 PM, Martin Feller fel...@mcs.anl.gov wrote:
 Still fails because the directory etc/globus_wsrf_gram is still
 missing. I don't know how to fix it.


 How do you install it (4.0.8, right)?
 
 It's right, 4.0.8.
 
 I downloaded the Full Toolkit Source Download from here:
 http://www.globus.org/toolkit/downloads/4.0.8/
 
 then ran:
 
 ./configure --prefix=/usr/local/globus-4.0.8/ --with-iodbc=/usr/lib
 
 and then
 
 make | tee installer.log

Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile

2010-06-22 Thread Martin Feller

Hmm, looks ok. The only reason I can see is that at some point another
grid-mapfile is being used than /etc/grid-security/grid-mapfile.
Please enable debug logging on the server-side in ws-gram and send
the container logfile containing logs of a problematic job submission.

Martin

Marco Lackovic wrote:
 On Tue, Jun 22, 2010 at 2:50 PM, Martin Feller fel...@mcs.anl.gov wrote:
 $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml,
 
 ?xml version=1.0 encoding=UTF-8?
 securityConfig xmlns=http://www.globus.org;
 credential
 key-file value=/etc/grid-security/containerkey.pem/
 cert-file value=/etc/grid-security/containercert.pem/
 /credential
 gridmap value=/etc/grid-security/grid-mapfile/
 /securityConfig
 
 
 $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml,
 
 securityConfig xmlns=http://www.globus.org;
 method name=createManagedJob
 auth-method
 GSITransport/
 GSISecureMessage/
 GSISecureConversation/
 /auth-method
 /method
 authz value=gridmap/
 reject-limited-proxy value=true/
 /securityConfig
 
 
 $GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml
 
 securityConfig xmlns=http://www.globus.org;
 auth-method
 GSITransport/
 GSISecureMessage/
 GSISecureConversation/
 /auth-method
 authz value=gridmap/
 run-as resource-identity/ /run-as
 /securityConfig
 
 
 Um, but I see that you didn't run 'make install' after configure and make,
 which is a required step.
 What if you run 'make install' at the end and try it then?
 
 Sorry, I forgot to mention that but I actually already did that too.

Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile

2010-06-20 Thread Martin Feller

How do

  $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml,
  $GLOBUS_LOCATION/etc/globus_wsrf_gram/managed-job-factory-security-config.xml,
  $GLOBUS_LOCATION/etc/globus_wsrf_gram/managed-job-security-config.xml

look like?

Do you have the env var GRIDMAP set in the environment of the user who runs
the GT4 server?

Martin


Marco Lackovic wrote:
 Hello,
 
 I am still having the problem mentioned below.
 
 Basically I get a user is not in the grid mapfile error message,
 while the user actually is in the grid-mapfile. Using GT 4.0.8 on
 Ubuntu 10.04.
 
 Any clue would be highly appreciated.
 
 
 On Mon, May 3, 2010 at 6:36 PM, Marco Lackovic
 lacko...@si.deis.unical.it wrote:
 when I run the following command:

 globusrun-ws -submit -c /bin/touch touched_it

 I get the following error:

 Submitting job...Done.
 Job ID: uuid:aa88b8ce-56d6-11df-ac93-00248ce78cc1
 Termination time: 05/04/2010 17:09 GMT
 Current job state: Failed
 Destroying job...Done.
 globusrun-ws: Job failed: Error code: 201
 Script stderr:
 john is not in the grid mapfile

 and on the terminal, on the same machine, where the globus container
 of GT 4.0.8 is running:

 2010-05-03 19:22:00,820 INFO  exec.StateMachine
 [RunQueueThread_0,logJobAccepted:3424] Job
 6237c130-56d8-11df-9b26-c844d3674bc1 accepted for local user 'john'
 for DN '/O=XGrid/OU=YGrid/CN=John Doe'
 2010-05-03 19:22:00,911 WARN  exec.StateMachine
 [RunQueueThread_2,createFaultFromErrorCode:3181] Unhandled fault code
 201
 2010-05-03 19:22:01,281 INFO  exec.StateMachine
 [RunQueueThread_7,logJobFailed:3455] Job
 6237c130-56d8-11df-9b26-c844d3674bc1 failed. Description: Error code:
 201 Cause: org.globus.exec.generated.FaultType: Error code: 201 caused
 by [0: org.oasis.wsrf.faults.BaseFaultType: Script stderr:
 john is not in the grid mapfile]

 while actually john *is* in the grid-mapfile:

 /O=XGrid/OU=YGrid/CN=John Doe john

 Furthermore, when I run grid-mapfile-check-consistency I get the
 following output:

 Checking /etc/grid-security/grid-mapfile grid mapfile
 Verifying grid mapfile existence...OK
 Checking for duplicate entries...OK
 Checking for valid user names...OK


 I installed GT 4.0.x many times and this is the first time something
 like this happens. Does anybody know what could be the problem?

Re: [gt-user] Error while running container and job submission

2010-06-04 Thread Martin Feller

Ankuj Gupta wrote:
 The RSL file has following contents
 
 job
 executablemy_echo/executable
 directory${GLOBUS_USER_HOME}/directory
 argumentHello/argument
 
 
 argumentWorld!/argument
 stdout${GLOBUS_USER_HOME}/stdout/stdout
 stderr${GLOBUS_USER_HOME}/stderr/stderr
 fileStageIn
 transfer
 
 
 sourceUrlgsiftp://ashish.gridglobus.com:2811/bin/echo 
 http://ashish.gridglobus.com:2811/bin/echo/sourceUrl
 
 destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl
 
 
 /transfer
 /fileStageIn
 fileCleanUp
 deletion
 filefile:///${GLOBUS_USER_HOME}/my_echo/file
 /deletion
 /fileCleanUp
 
 
 /job
 
 Here ashish.gridglobus.com http://ashish.gridglobus.com is the user from 
 which I am submitting the job.
 

User? Don't you mean host/machine?
And the machine where GT4 is running and where you submit the job
to is ankuj.gridglobus.com with the ip address 192.168.1.40?

 
 Ankuj
 
 
 On Fri, Jun 4, 2010 at 9:17 AM, Martin Feller fel...@mcs.anl.gov
 mailto:fel...@mcs.anl.gov wrote:
 
 What does your job description look like?
 
 J
 
 Ankuj Gupta wrote:
  Hi!!
 
  I am getting the following error while running the container
 
  2010-06-03 18:47:38,154 ERROR container.GSIServiceThread
  [ServiceThread-47,process:147] Error processing request
  java.io.EOFException
  at
 
 
 org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:56)
  at
 
 
 org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:60)
  at
 
 org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:122)
  at
  org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:142)
  at
 
 org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:161)
  at
 
 
 org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:99)
  at
 org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:291)
  2010-06-03 18:47:38,347 INFO  impl.DefaultIndexService
  [ServiceThread-45,processConfigFile:107] Reading default registration
  configuration from file:
  /usr/local/globus-4.0.7/etc/globus_wsrf_mds_index/hierarchy.xml
  Starting SOAP server at: https://192.168.1.40:8443/wsrf/services/
  With the following services:
 
 
 
  If I try to submit a job from a user node using an RSL file I get the
  following error on the client
  globusrun-ws: Job failed: Staging error for RSL element fileStageIn.
  ; nested exception is:
  javax.xml.rpc.soap.SOAPFaultException: Host authorization failed:
  expected /CN=host/192.168.1.40 http://192.168.1.40
 http://192.168.1.40, peer returned
 
 
 /O=Grid/OU=GlobusTest/OU=simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com
 http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com
  http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com
 
 
  Ankuj Gupta

Re: [gt-user] Error while running container and job submission

2010-06-04 Thread Martin Feller

Looks like the GridFTP server on ashish.gridglobus.com expects the
submitting machine (GT machine) to authorize with a credential id
containing the ip (192.168.1.40) instead of ankuj.gridglobus.com.
But I'm still a bit unsure.
Please run the job again adding the debug option on the client-side:
globusrun-ws -submit -dbg  and sent the output to this list.

Martin

Ankuj Gupta wrote:
 I am trying to submit a job from one machine akhil.gridglobus,com with
 IP 192.168.1.50 to ankuj.gridglobus.com http://ankuj.gridglobus.com
 with IP 192.168.1.40 and that is where the container is running
 
 Ankuj
 
 On Fri, Jun 4, 2010 at 7:27 PM, Martin Feller fel...@mcs.anl.gov
 mailto:fel...@mcs.anl.gov wrote:
 
 Ankuj Gupta wrote:
  The RSL file has following contents
 
  job
  executablemy_echo/executable
  directory${GLOBUS_USER_HOME}/directory
  argumentHello/argument
 
 
  argumentWorld!/argument
  stdout${GLOBUS_USER_HOME}/stdout/stdout
  stderr${GLOBUS_USER_HOME}/stderr/stderr
  fileStageIn
  transfer
 
 
 
 sourceUrlgsiftp://ashish.gridglobus.com:2811/bin/echo
 http://ashish.gridglobus.com:2811/bin/echo
 http://ashish.gridglobus.com:2811/bin/echo/sourceUrl
 
 destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl
 
 
  /transfer
  /fileStageIn
  fileCleanUp
  deletion
  filefile:///${GLOBUS_USER_HOME}/my_echo/file
  /deletion
  /fileCleanUp
 
 
  /job
 
  Here ashish.gridglobus.com http://ashish.gridglobus.com
 http://ashish.gridglobus.com is the user from which I am
 submitting the job.
 
 
 User? Don't you mean host/machine?
 And the machine where GT4 is running and where you submit the job
 to is ankuj.gridglobus.com http://ankuj.gridglobus.com with the ip
 address 192.168.1.40?
 
 
  Ankuj
 
 
  On Fri, Jun 4, 2010 at 9:17 AM, Martin Feller fel...@mcs.anl.gov
 mailto:fel...@mcs.anl.gov
  mailto:fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov wrote:
 
  What does your job description look like?
 
  J
 
  Ankuj Gupta wrote:
   Hi!!
  
   I am getting the following error while running the container
  
   2010-06-03 18:47:38,154 ERROR container.GSIServiceThread
   [ServiceThread-47,process:147] Error processing request
   java.io.EOFException
   at
  
 
 
 org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:56)
   at
  
 
 
 org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:60)
   at
  
 
 org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:122)
   at
  
 org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:142)
   at
  
 
 org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:161)
   at
  
 
 
 org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:99)
   at
 
 org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:291)
   2010-06-03 18:47:38,347 INFO  impl.DefaultIndexService
   [ServiceThread-45,processConfigFile:107] Reading default
 registration
   configuration from file:
   /usr/local/globus-4.0.7/etc/globus_wsrf_mds_index/hierarchy.xml
   Starting SOAP server at:
 https://192.168.1.40:8443/wsrf/services/
   With the following services:
  
  
  
   If I try to submit a job from a user node using an RSL file
 I get the
   following error on the client
   globusrun-ws: Job failed: Staging error for RSL element
 fileStageIn.
   ; nested exception is:
   javax.xml.rpc.soap.SOAPFaultException: Host
 authorization failed:
   expected /CN=host/192.168.1.40 http://192.168.1.40
 http://192.168.1.40
  http://192.168.1.40, peer returned
  
 
 
 /O=Grid/OU=GlobusTest/OU=simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com
 http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com
 
 http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com
  
 http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com
  
  
   Ankuj Gupta

Re: [gt-user] Error while running container and job submission

2010-06-03 Thread Martin Feller

What does your job description look like?

J

Ankuj Gupta wrote:
 Hi!!
 
 I am getting the following error while running the container
 
 2010-06-03 18:47:38,154 ERROR container.GSIServiceThread
 [ServiceThread-47,process:147] Error processing request
 java.io.EOFException
 at
 org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:56)
 at
 org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:60)
 at
 org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:122)
 at
 org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:142)
 at
 org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:161)
 at
 org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:99)
 at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:291)
 2010-06-03 18:47:38,347 INFO  impl.DefaultIndexService
 [ServiceThread-45,processConfigFile:107] Reading default registration
 configuration from file:
 /usr/local/globus-4.0.7/etc/globus_wsrf_mds_index/hierarchy.xml
 Starting SOAP server at: https://192.168.1.40:8443/wsrf/services/
 With the following services:
 
 
 
 If I try to submit a job from a user node using an RSL file I get the
 following error on the client
 globusrun-ws: Job failed: Staging error for RSL element fileStageIn.
 ; nested exception is:
 javax.xml.rpc.soap.SOAPFaultException: Host authorization failed:
 expected /CN=host/192.168.1.40 http://192.168.1.40, peer returned
 /O=Grid/OU=GlobusTest/OU=simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com
 http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com
 
 
 Ankuj Gupta

Re: [gt-user] [Slightly OT] Handling repeated input files

2010-06-01 Thread Martin Feller

Dougal,

To the best of my knowledge Gram4.x/RFT does not have such a detection
mechanism. I don't know if tools on top of Gram4 (Swift, Gridway,
others?) provide mechanisms for your use-case.

A general note:
If you are not tied to web-services, I'd check with the Gram5 folks
if they still support what Steve described, and maybe consider going
with Gram5 (based on and improved Gram2) in the long term, because in
medium terms Gram4.x won't be supported anymore, but Gram5 will
(back to the future! :) )

Martin

Steven Timm wrote:
 Dougal--I am not sure what the GT4 equivalent is for file
 stage-in but I know that the GT2 stagein does detect that the
 same file has previously been staged in, and not stage it in again.
 The cache on the far end has many hard links to the same file.
 
 Steve
 
 
 On Mon, 31 May 2010, Dougal Ballantyne wrote:
 
 Dear GT,

 I have been working on a project for several months now researching
 and developing a grid solution based on Globus Toolkit 4. Many thanks
 to people who have helped me with previous issues.

 I have a slightly Off-Topic question related to how others handle a
 particular scenario.

 We have a job generation and control application that we have added
 support for Globus through some perl modules that call globusrun-ws.
 When a job is generated, the program pulls from the job database the
 associated input files and creates an XML file which lists the input
 files in StageIn and the requested results file in StageOut. This
 works great for a single job and jobs that all use different input
 data. However we often have a scenario when we generate several
 hundred jobs that all use the same input data. In our current setup we
 would StageIn the same input file several hundred times.

 I was wondering if that was a method or known best practice within the
 Globus Toolkit for handling this sort of scenario. I am aware that we
 could modify the tool to stage the data first, run the jobs and then
 remove the input file BUT that would also be a change of workflow for
 the users.

 Your thoughts or comments greatly appreciated.

 Kind regards,

 Dougal Ballantyne

Re: [gt-user] Error while staging a job

2010-05-31 Thread Martin Feller

The error says Unable to connect to localhost:8443. You are submitting the
job to localhost and it seems that there's no GT server running on localhost
(or it doesn't listen on port 8443).
Either submit the job to a machine where a server is running (use the -F option
[check globusrun-ws -help about the usage of the -F option]) or start a GT
server at localhost.

Martin

Ankuj Gupta wrote:
 Hi!!
 
 I am submitting a follwoing rsl file
 job
 executablemy_echo/executable
 directory${GLOBUS_USER_HOME}/directory
 argumentHello/argument
 argumentWorld!/argument
 stdout${GLOBUS_USER_HOME}/stdout/stdout
 stderr${GLOBUS_USER_HOME}/stderr/stderr
 fileStageIn
 transfer
 sourceUrlgsiftp://akhil.gridglobus.com:2811/bin/echo
 http://akhil.gridglobus.com:2811/bin/echo/sourceUrl

 destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl
 /transfer
 /fileStageIn
 fileCleanUp
 deletion
 filefile:///${GLOBUS_USER_HOME}/my_echo/file
 /deletion
 /fileCleanUp
 /job
 
 But I am getting the following error
 
 [t...@akhil ~]$ globusrun-ws -submit -S -f a.rsl
 Delegating user credentials...Failed.
 globusrun-ws: Error trying to delegate
 globus_xio: Unable to connect to localhost:8443
 globus_xio: System error in connect: Connection refused
 globus_xio: A system call failed: Connection refused
 
 Ankuj Gupta

Re: [gt-user] GT4.2.1 deployment into Tomcat 5.5 - errors with persisted store

2010-05-25 Thread Martin Feller

Dougal Ballantyne wrote:
 Martin,
 
 I have got it working. I ended up having to add
 -Dorg.globus.wsrf.container.persistence.dir=/where/ever/you/want/it/to/be
 directly into the RHEL tomcat daemon start/stop script. It did not
 seem to be picking up GLOBUS_OPTIONS, no matter where I exported them.
 Might be a RHEL thing or I need to dig a bit further but at least I am
 able to change it.

Might also be a Globus thing. :)
Glad that it's working, and thanks for the feedback.

Martin

 
 Thank you.
 
 -Dougal
 
 
 On Mon, May 24, 2010 at 4:28 PM, Dougal Ballantyne
 dougal.li...@gmail.com wrote:
 Martin, I am kicking myself for not reading more... Sorry.

 Starting testing, didn't go in first time so tweaking with the rather
 adapted RHEL startup scripts for tomcat to get the environment
 variable exported.

 Thank you for the steer.

 -Dougal

 On Mon, May 24, 2010 at 4:12 PM, Martin Feller fel...@mcs.anl.gov wrote:
 Martin Feller wrote:
 Hi,

 I didn't try it, just an educated guess:
 Any chance you have the property 
 -Dorg.globus.wsrf.container.persistence.dir
 set to /usr/share/tomcat5/.globus, e.g. via the environment variable
 GLOBUS_OPTIONS?
 (http://www.globus.org/toolkit/docs/4.0/common/javawscore/Java_WS_Core_Public_Interfaces.html#s-javawscore-Public_Interfaces-env)
 Oh, this was a 4.0 link, but it's the same in 4.2:
 http://www.globus.org/toolkit/docs/4.2/4.2.1/admin/install/


 If not: Does it work if you explicitly set it, like

 export GLOBUS_OPTIONS=$GLOBUS_OPTIONS 
 -Dorg.globus.wsrf.container.persistence.dir=/where/ever/you/want/it/to/be

 and restart tomcat?

 Martin

 Dougal Ballantyne wrote:
 Hi,

 I have been working on a GT4.2.1 deployment and for larger scale
 testing, I have been preparing for a deployment into the Tomcat 5.5
 server. I am working on a RHEL 5.5 system and would like to use the
 provided tomcat5-* rpms.

 I have successfully deployed the application into the webapps folder
 and adjusted the locations of the BDB databases and temporary storage
 locations and it all works as expected. However there is one item I
 just cannot seem to get relocated, the persisted directory created
 under the user starting the container in ~/.globus/persisted.

 I am getting the following errors in catalina.out:

 Using CATALINA_BASE:   /usr/share/tomcat5
 Using CATALINA_HOME:   /usr/share/tomcat5
 Using CATALINA_TMPDIR: /usr/share/tomcat5/temp
 Using JRE_HOME:
 May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log
 INFO: ContextListener: contextInitialized()
 May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log
 INFO: SessionListener: contextInitialized()
 May 23, 2010 3:31:19 PM org.apache.commons.vfs.VfsLog info
 INFO: Using /usr/share/tomcat5/temp/vfs_cache as temporary files store.
 May 23, 2010 3:31:20 PM
 org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask
 run
 WARNING: Recovery exception
 org.globus.wsrf.ResourceException: Unabled to locate persisted
 resource properties directory. ; nested exception is:
 java.io.IOException: [JWSCORE-205] Failed to create storage
 directory: 
 '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType'
 at 
 org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:176)
 at 
 org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask.run(ManagedJobFactoryResource.java:388)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: [JWSCORE-205] Failed to create storage
 directory: 
 '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType'
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.createStorageDirectory(FilePersistenceHelper.java:123)
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.setStorageDirectory(FilePersistenceHelper.java:191)
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:181)
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:141)
 at 
 org.globus.wsrf.utils.XmlPersistenceHelper.init(XmlPersistenceHelper.java:74)
 at 
 org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:171)
 ... 9 more

Re: [gt-user] GT4.2.1 deployment into Tomcat 5.5 - errors with persisted store

2010-05-24 Thread Martin Feller

Hi,

I didn't try it, just an educated guess:
Any chance you have the property -Dorg.globus.wsrf.container.persistence.dir
set to /usr/share/tomcat5/.globus, e.g. via the environment variable
GLOBUS_OPTIONS?
(http://www.globus.org/toolkit/docs/4.0/common/javawscore/Java_WS_Core_Public_Interfaces.html#s-javawscore-Public_Interfaces-env)
If not: Does it work if you explicitly set it, like

export GLOBUS_OPTIONS=$GLOBUS_OPTIONS 
-Dorg.globus.wsrf.container.persistence.dir=/where/ever/you/want/it/to/be

and restart tomcat?

Martin

Dougal Ballantyne wrote:
 Hi,
 
 I have been working on a GT4.2.1 deployment and for larger scale
 testing, I have been preparing for a deployment into the Tomcat 5.5
 server. I am working on a RHEL 5.5 system and would like to use the
 provided tomcat5-* rpms.
 
 I have successfully deployed the application into the webapps folder
 and adjusted the locations of the BDB databases and temporary storage
 locations and it all works as expected. However there is one item I
 just cannot seem to get relocated, the persisted directory created
 under the user starting the container in ~/.globus/persisted.
 
 I am getting the following errors in catalina.out:
 
 Using CATALINA_BASE:   /usr/share/tomcat5
 Using CATALINA_HOME:   /usr/share/tomcat5
 Using CATALINA_TMPDIR: /usr/share/tomcat5/temp
 Using JRE_HOME:
 May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log
 INFO: ContextListener: contextInitialized()
 May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log
 INFO: SessionListener: contextInitialized()
 May 23, 2010 3:31:19 PM org.apache.commons.vfs.VfsLog info
 INFO: Using /usr/share/tomcat5/temp/vfs_cache as temporary files store.
 May 23, 2010 3:31:20 PM
 org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask
 run
 WARNING: Recovery exception
 org.globus.wsrf.ResourceException: Unabled to locate persisted
 resource properties directory. ; nested exception is:
 java.io.IOException: [JWSCORE-205] Failed to create storage
 directory: 
 '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType'
 at 
 org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:176)
 at 
 org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask.run(ManagedJobFactoryResource.java:388)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: [JWSCORE-205] Failed to create storage
 directory: 
 '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType'
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.createStorageDirectory(FilePersistenceHelper.java:123)
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.setStorageDirectory(FilePersistenceHelper.java:191)
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:181)
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:141)
 at 
 org.globus.wsrf.utils.XmlPersistenceHelper.init(XmlPersistenceHelper.java:74)
 at 
 org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:171)
 ... 9 more
 
 
 I have searched through the source and the deployed application but I
 can find no reference to where it might be getting this path from.
 
 [r...@globus-sge globus-4.2.1]# pwd
 /opt/globus-4.2.1
 [r...@globus-sge globus-4.2.1]# grep -r '/.globus/persisted/' .
 grep: warning: ./etc/gpt/packages/packages: recursive directory loop
 
 grep: warning: ./etc/globus_packages/packages: recursive directory loop
 
 ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/fork/globus-scheduler-provider-fork.in:my
 @persistence_files =
 glob(~/.globus/persisted/$host-$port/ManagedExecutableJobResourceStateType/*.xml);
 ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/pbs/globus-scheduler-provider-pbs.in:my
 @persistence_files =
 glob(~/.globus/persisted/$host-$port/ManagedExecutableJobResourceStateType/*.xml);
 ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/condor/globus-scheduler-provider-condor.in:my
 @persistence_files =

Re: [gt-user] GT4.2.1 deployment into Tomcat 5.5 - errors with persisted store

2010-05-24 Thread Martin Feller

Martin Feller wrote:
 Hi,
 
 I didn't try it, just an educated guess:
 Any chance you have the property -Dorg.globus.wsrf.container.persistence.dir
 set to /usr/share/tomcat5/.globus, e.g. via the environment variable
 GLOBUS_OPTIONS?
 (http://www.globus.org/toolkit/docs/4.0/common/javawscore/Java_WS_Core_Public_Interfaces.html#s-javawscore-Public_Interfaces-env)

Oh, this was a 4.0 link, but it's the same in 4.2:
http://www.globus.org/toolkit/docs/4.2/4.2.1/admin/install/


 If not: Does it work if you explicitly set it, like
 
 export GLOBUS_OPTIONS=$GLOBUS_OPTIONS 
 -Dorg.globus.wsrf.container.persistence.dir=/where/ever/you/want/it/to/be
 
 and restart tomcat?
 
 Martin
 
 Dougal Ballantyne wrote:
 Hi,

 I have been working on a GT4.2.1 deployment and for larger scale
 testing, I have been preparing for a deployment into the Tomcat 5.5
 server. I am working on a RHEL 5.5 system and would like to use the
 provided tomcat5-* rpms.

 I have successfully deployed the application into the webapps folder
 and adjusted the locations of the BDB databases and temporary storage
 locations and it all works as expected. However there is one item I
 just cannot seem to get relocated, the persisted directory created
 under the user starting the container in ~/.globus/persisted.

 I am getting the following errors in catalina.out:

 Using CATALINA_BASE:   /usr/share/tomcat5
 Using CATALINA_HOME:   /usr/share/tomcat5
 Using CATALINA_TMPDIR: /usr/share/tomcat5/temp
 Using JRE_HOME:
 May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log
 INFO: ContextListener: contextInitialized()
 May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log
 INFO: SessionListener: contextInitialized()
 May 23, 2010 3:31:19 PM org.apache.commons.vfs.VfsLog info
 INFO: Using /usr/share/tomcat5/temp/vfs_cache as temporary files store.
 May 23, 2010 3:31:20 PM
 org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask
 run
 WARNING: Recovery exception
 org.globus.wsrf.ResourceException: Unabled to locate persisted
 resource properties directory. ; nested exception is:
 java.io.IOException: [JWSCORE-205] Failed to create storage
 directory: 
 '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType'
 at 
 org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:176)
 at 
 org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask.run(ManagedJobFactoryResource.java:388)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: [JWSCORE-205] Failed to create storage
 directory: 
 '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType'
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.createStorageDirectory(FilePersistenceHelper.java:123)
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.setStorageDirectory(FilePersistenceHelper.java:191)
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:181)
 at 
 org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:141)
 at 
 org.globus.wsrf.utils.XmlPersistenceHelper.init(XmlPersistenceHelper.java:74)
 at 
 org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:171)
 ... 9 more


 I have searched through the source and the deployed application but I
 can find no reference to where it might be getting this path from.

 [r...@globus-sge globus-4.2.1]# pwd
 /opt/globus-4.2.1
 [r...@globus-sge globus-4.2.1]# grep -r '/.globus/persisted/' .
 grep: warning: ./etc/gpt/packages/packages: recursive directory loop

 grep: warning: ./etc/globus_packages/packages: recursive directory loop

 ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/fork/globus-scheduler-provider-fork.in:my
 @persistence_files =
 glob(~/.globus/persisted/$host-$port/ManagedExecutableJobResourceStateType/*.xml);
 ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/pbs/globus-scheduler-provider-pbs.in:my
 @persistence_files =
 glob(~/.globus/persisted/$host-$port/ManagedExecutableJobResourceStateType/*.xml);
 ./SRC/gt4.2.1-all-source-installer/source-trees-thr

Re: [gt-user] globusrun-ws failed with Job-Failed: Invalid stdout element.

2010-05-21 Thread Martin Feller

Hi,

Very probably something is wrong with your file system mapping file.

Some information about the file system mapping:
http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/gram4/admin/#gram4-Interface_Config_Frag-filesysmap

Does $GLOBUS_LOCATION/etc/globus_wsrf_gram/globus_gram_fs_map_config.xml exist?
Did you modify it and it's maybe broken?

Martin

Jörg Lenhardt wrote:
Hello!

I build Globus TK 4.2.1 on Solaris 10 (SPARC) and till now everything
worked fine. But if I try to submit a WS GRAM job using a job definition
file, the execution fails with the error message: Invalid stdout
element. File map initialization failed.

Job definition:
?xml version=1.0 encoding=UTF-8?
job
executable/bin/echo/executable
argumentOutput/argument
stdout${GLOBUS_USER_HOME}/stdout/stdout
stderr${GLOBUS_USER_HOME}/stderr/stderr
/job

Job execution:
aus...@zone2:~ $ globusrun-ws -submit -f echo_job.xml
Submitting job...Done.
Job ID: uuid:4a9df2dc-6444-11df-ac26-01006cc8
Termination time: 05/20/3010 19:17 GMT
Current job state: Failed
Destroying job...Done.
globusrun-ws: Job failed: Invalid stdout element. File map
initialization failed.

Globus container output:
2010-05-20T21:17:12.237+02:00 INFO
PersistentManagedExecutableJobResource.4aca2640-6444-11df-80ff-8db553e71ea8
[ServiceThread-58,start:761] Job 4aca2640-6444-11df-80ff-8db553e71ea8
with client submission-id 4a9df2dc-6444-11df-ac26-01006cc8 accepted
for local user 'auser1'
2010-05-20T21:17:13.752+02:00 INFO handler.SubmitStateHandler
[pool-1-thread-5,process:172] Job 4aca2640-6444-11df-80ff-8db553e71ea8
submitted with local job ID '4bd68876-6444-11df-bc23-01007edd:6912'
2010-05-20T21:17:17.191+02:00 INFO
handler.FinalizeTerminationStateHandler
[pool-1-thread-3,handleFailedState:100] Job
4aca2640-6444-11df-80ff-8db553e71ea8 failed. Fault #1: Description:
Invalid stdout element. File map initialization failed. Cause:
org.globus.exec.generated.ServiceLevelAgreementFaultType: Invalid stdout
element. File map initialization failed. caused by [0:
org.oasis.wsrf.faults.BaseFaultType: File map initialization failed. ]

The files stdout and stderr ARE created and stdout contains Output.

The following works fine without an error.
aus...@zone2:~ $ globusrun-ws -submit -c /bin/touch /tmp/file

I really do not know what's wrong.

Some information about the environment:

- Using a Solaris Zone for Globus
- perl is installed in /usr/local/ with XML::Parser
- PATH is set to search /usr/local/bin before any other path (global in
/etc/profile)
- sudo is configured
globus ALL=(auser1) NOPASSWD:
/usr/local/globus-4.2.1/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile
/usr/local/globus-4.2.1/libexec/globus-job-manager-script.pl *
globus ALL=(auser1) NOPASSWD:
/usr/local/globus-4.2.1/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile
/usr/local/globus-4.2.1/libexec/globus-gram-local-proxy-tool *

Hope anyone could guide me out of the darkness ... ;)

Joerg Lenhardt

Re: [gt-user] Error in writing rsl file for globus 4.0.7

2010-05-19 Thread Martin Feller

There might be 2 reasons:

1. You didn't specify a job manager.
2. Try adding namespaces for the factoryEndpoint element.

Does the following work?

job
   factoryEndpoint
  xmlns:gram=http://www.globus.org/namespaces/2004/10/gram/job;
  xmlns:wsa=http://schemas.xmlsoap.org/ws/2004/03/addressing;
  wsa:Address
https://192.168.4.88:8443/wsrf/services/ManagedJobFactoryService
  /wsa:Address
  wsa:ReferenceProperties
 gram:ResourceIDFork/gram:ResourceID
  /wsa:ReferenceProperties
   /factoryEndpoint
   executable/bin/date/executable
/job

Martin


praveenesh kumar wrote:
 Hello everyone ..!!
 
 I am using globus 4.0.7 on 4 machines.. my grid is configured properly
 and I am able to submit jobs to other grid nodes using globusrun-ws command.
 Now , I am trying to write jobs in rsl format and inside that rsl format
 I want to use the ManagedJobFactory service of other grid nots. but I am
 not able to submit jobs using ManagedJobFactroy sercvice of other nodes..
 The point is I do not want to use globusrun-ws -F (othergridnode's ip
 address)  -submit -c my rsl file
 I want to specify the other grid node's ip address in the rsl file..
 Can anyone suggest me some simple example on how to do this..
 
 I am writing the following code for my rsl flie.. it is giving me some
 parsing error
 
 job
 factoryEndpoint
 Address
 https://192.168.4.88:8443/wsrf/services/ManagedJobFactoryService
 /Address
 /factoryEndpoint
 executable/bin/date/executable
 /job
 
 Can someone correct the above code. I need this thing urgently..
 Thanxxx..!!!

Re: [gt-user] WS_GRAM Stage-out problem

2010-05-10 Thread Martin Feller

Marco Lackovic wrote:
 On Mon, May 10, 2010 at 1:24 AM, Martin Feller fel...@mcs.anl.gov wrote:
 Ok, my first guess is, that the mapping in the grid-mapfile is not as you
 think it is, i.e. the DN of the user who submits the transfer request
 is mapped to another user and not to the user 'globus'.
 You mentioned that it is, but maybe worth double-checking.
 
 The procedure chosen for the system I am using is to have local
 accounts, on every grid node, for all grid users and then have them
 mapped to their own local accounts in the grid-mapfile. Users were
 then added to the globus group so that they could access common files
 located in the /home/globus directory.
 
 I suspect this might not be the proper way to do things and I am in a
 position to change them. Do you advice against that procedure? Is it
 customary to map instead all grid users to the user 'globus'? Can you
 suggest a good reference on this topic?
 

This approach looks fine to me. We use the same approach in a project I
work on: All users have individual accounts, but are members of various
local unix groups. Each group reflects a project.
Depending on group membership they can access project data being owned
by that group, or not.

There is another approach where all users share a community credential
which is augmented with user-specific information (attributes). Authorization
decisions are then done by callouts which check the the user-specific
attributes.
I don't know enough about it to give you detailed information about this,
but if you are interested i could maybe find documentation pointers or
forward this to folks who know more about this.
Or maybe there's even somebody on this list who can provide input on this!

 
 If you have control over the machine where the GT server is running and the
 system is not a production system: create a grid-mapfile with just one
 mapping of your DN (you can e.g. get your DN by running the command
 'grid-proxy-info -identity' on the client-machine) to the local user
 'globus' and see if that works.
 
 In the end I have found out that Can't do MLST on non-existing
 file/dir error message was actually a permission problem: the file
 permissions were 660 (rw-rw) but the local user, to which the grid
 user was mapped to, didn't belong to the globus group. Assigning the
 local user to the globus group fixed the problem in all the machines
 but one on which I still got that error, despite the user does belong
 to the globus group.
 
 

Glad to hear that it works now on most machines. Hard to tell for me
why this one machine still causes problems. I hope you can figure it out.

Martin

Re: [gt-user] WS_GRAM Stage-out problem

2010-05-09 Thread Martin Feller

Marco Lackovic wrote:
 On Sun, May 9, 2010 at 4:56 AM, Martin Feller fel...@mcs.anl.gov wrote:
 I'm sorry, I meant to say: Try to use the file /tmp/somefile.txt in your
 fileStageOut element and see if that fails too. /tmp/somefile.txt should
 have 777 as permissions for sanity check.
 
 I tried to change to 777 the permission of the original file I was
 trying to transfer  /home/globus/pippo.xml but still got the same
 error.

Hm, one goal of mine was to get out of /home/globus, to make sure
it's not a permission problem on the directory.

 
 
 If that fails: Can you please paste your job description?
 
 I am sorry, can you be more specific? I am working on some code which
 was not written by me and is not documented so I have to figure out
 things.
 
 Do RFT file transfers have job descriptions too? I thought only Gram jobs had.

The thread name indicates that this transfer is initialized by a ws-gram
job. Is it not?

Re: [gt-user] WS_GRAM Stage-out problem

2010-05-09 Thread Martin Feller

Ok, my first guess is, that the mapping in the grid-mapfile is not as you
think it is, i.e. the DN of the user who submits the transfer request
is mapped to another user and not to the user 'globus'.
You mentioned that it is, but maybe worth double-checking.

If you have control over the machine where the GT server is running and the
system is not a production system: create a grid-mapfile with just one
mapping of your DN (you can e.g. get your DN by running the command
'grid-proxy-info -identity' on the client-machine) to the local user
'globus' and see if that works.

Maybe unlikely, but worth a check: is the directory /home/globus readable
for the user globus?

Martin

Marco Lackovic wrote:
 On Sun, May 9, 2010 at 2:15 PM, Martin Feller fel...@mcs.anl.gov wrote:
 Hm, one goal of mine was to get out of /home/globus, to make sure
 it's not a permission problem on the directory.
 
 You were right on this. From the /tmp/ directory the file transferred
 successfully.
 
 What can I do to make it work from /home/globus too?

Re: [gt-user] WS_GRAM Stage-out problem

2010-05-08 Thread Martin Feller

Marco Lackovic wrote:
 On Sat, May 8, 2010 at 6:14 AM, Martin Feller fel...@mcs.anl.gov wrote:
 That seems to be a different error message, if I remember correctly.
 Not sure if Helmut got a Permission denied.
 
 I think the system called failed for him was a No such file or
 directory, but the Can't do MLST on non-existing file/dir error
 message was the same as mine.
 
 
 Is the DN of the caller really mapped to the local user globus in the 
 grid-mapfile?
 
 Yes, it is. The grid-mapfile passed also the consistency check
 performed with the command grid-mapfile-check-consistency.
 
 
 What if you try to transfer /tmp/somefile.txt with permissions on
 /tmp/somefile.txt being 777 (rwxrwxrwx)?
 
 I tried to copy it from the machine where I got the error to the caller 
 machine:
 
  - with scp it copied and arrived at destionation as 777 (rwxrwxrwx);
 
  - with globus-url-copy it copied but arrived at destionation as 644
 (rw-r--r--).
 
 

I'm sorry, I meant to say: Try to use the file /tmp/somefile.txt in your
fileStageOut element and see if that fails too. /tmp/somefile.txt should
have 777 as permissions for sanity check.

If that fails: Can you please paste your job description?

Re: [gt-user] WS_GRAM Stage-out problem

2010-05-07 Thread Martin Feller

That seems to be a different error message, if I remember correctly.
Not sure if Helmut got a Permission denied.

Is the DN of the caller really mapped to the local user globus in the 
grid-mapfile?
What if you try to transfer /tmp/somefile.txt with permissions on
/tmp/somefile.txt being 777 (rwxrwxrwx)?

Martin

Marco Lackovic wrote:
 On Thu, Aug 6, 2009 at 4:03 PM, Martin Feller fel...@mcs.anl.gov wrote:
 Uh, it's a while ago, but i think i remember this issue.
 I *thought* it was fixed in 4.0.8, but I created a jar from
 globus_4_0_branch. It's built using Java 1.4 and you can get
 it from here:
 http://www.mcs.anl.gov/~feller/heller/globus_wsrf_rft.jar

 Can you give it a try by dropping it into ${GLOBUS_LOCATION}/lib,
 and tell us if it works for you with that jar?
 
 I am using GT 4.0.8 and also having that Can't do MLST on
 non-existing file/dir error message on an actually existing file.
 
 I tried substituting the globus_wsrf_rft.jar file in
 ${GLOBUS_LOCATION}/lib as you suggested but that didn't fix it, I am
 still getting the same error:
 
 2010-05-07 16:47:16,163 ERROR service.TransferWork [Thread-9,run:401]
 Terminal transfer error:
 Can't do MLST on non-existing file/dir /home/globus/pippo.xml on
 server pluto.paperino.com [Caused by: Server refused performing the
 request. Custom message: Server refused MLST command (error code 1)
 [Nested exception message:  Custom message: Unexpected reply:
 500-Command failed : System error in stat: Permission denied
 500-A system call failed: Permission denied
 500 End.]]
 Can't do MLST on non-existing file/dir /home/globus/pippo.xml on
 server pluto.paperino.com. Caused by
 org.globus.ftp.exception.ServerException: Server refused performing
 the request. Custom message: Server refused MLST command (error code
 1) [Nested exception message:  Custom message: Unexpected reply:
 500-Command failed : System error in stat: Permission denied
 500-A system call failed: Permission denied
 500 End.].  Nested exception is
 org.globus.ftp.exception.UnexpectedReplyCodeException:  Custom
 message: Unexpected reply: 500-Command failed : System error in stat:
 Permission denied
 500-A system call failed: Permission denied
 500 End.
at 
 org.globus.ftp.vanilla.FTPControlChannel.execute(FTPControlChannel.java:412)
at org.globus.ftp.FTPClient.mlst(FTPClient.java:598)
at 
 org.globus.transfer.reliable.service.cache.SingleConnectionImpl.doMlst(SingleConnectionImpl.java:287)
at 
 org.globus.transfer.reliable.service.cache.ThirdPartyConnectionImpl.doMlstOnSource(ThirdPartyConnectionImpl.java:276)
at 
 org.globus.transfer.reliable.service.client.ThirdPartyTransferClient.doMlstOnSource(ThirdPartyTransferClient.java:163)
at 
 org.globus.transfer.reliable.service.client.ThirdPartyTransferClient.process(ThirdPartyTransferClient.java:101)
at 
 org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:379)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:619)
 
 
 The file /home/globus/pippo.xml actually exists.
 Its details are the following:
 
 -rw-rw 1 globus globus 1316 May  7 10:29 pippo.xml

Re: [gt-user] Which service can i use to get all the epr files?

2010-04-21 Thread Martin Feller

Hi Raffaele,

No, such a service does not exist.
You have to write that functionality yourself for your services.
It shouldn't be too hard to do. If you have a resource home that manages
all resources of your service: this could be a place to start.
Existing services like RFT, WS-GRAM do not offer that functionality.

Martin

Raffaele Forgione wrote:
 Hello everyone,
 is there a native service in the globus container that may help me to
 gain all the epr files related to a service?
 
 Thanks in advance
 
 
 Condividi le tue emozioni e proteggi la tua privacy. Chiacchiera su
 Messenger http://www.windowslive.it/importaAmici.aspx

Re: [gt-user] error :gridftp, globus-url-copy

2010-04-16 Thread Martin Feller

Ok, I'm running out of ideas, but I'd try the following:
Build the GT again with a debug flavor (gcc32dbg, gcc64dbg) on hermione
if you didn't already do so. Then run grid-cert-diagnostics in gdb and
send the output. This will hopefully tell us more about the segfault, which 
might
be related to the gridftp error.

Martin

Martin Feller wrote:
 Sunah,
 
 Can you send /etc/grid-security/certificates/45fb3f91.0 from both machines to 
 me
 so that I can try it myself?
 If I knew another way to solve the problem I'd tell you.
 Maybe someone from the GridFTP or C security side has more ideas.
 
 Martin
 
 Sunah Park wrote:
 Martin,
  
 Thanks for your help.
 I built it from sources on both 2 machines..
 and I checked the openssl version of 2 machines are same.
  
 # 
 [glo...@harry ~]$ openssl version
 OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
 #
 [glo...@hermione ~]$ openssl version
 OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
 #
  
 And  /etc/grid-security/certificates/45fb3f91.0 are also the same on
 harry and hermione.
 It's too difficult to catch the problems.. 
 Is there another way to solve the problem?
  
 Sunah Park.
  
  

 2010/4/14 Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov

 Sunah Park,

 Hm, ok. How did you install the GT on these 2 machines: Did you build it
 from sources or did you use binary installers?
 If you built it from binary installers I wonder if maybe the openssl
 version on hermione is not compatible. What are the openssl versions
 on these 2 machines?

 I remember one case where the installation of a binary installer
 worked fine,
 the gridftp server started ok, but transfers failed with security
 related errors,
 due to an incompatible openssl version.

 For sanity: Can you double-check that
 /etc/grid-security/certificates/45fb3f91.0
 are really the same on harry and hermione?

 Martin

 박선아 wrote:
  Hi~ Martin,
  I'm Cinyoung's coworker and I saw your mails you sent her to solve the
  problems.
  Then I did the following works  written in your email:
  * Put all grid security stuff into /etc/grid-security on both
 machines
  * Unset all globus security related environment variables on both
 machines for all users
  * The content of harry:/etc/grid-security/certificates seems
 ok, at
  least
grid-cert-diagnostics does not segfault. Copy the content of
harry:/etc/grid-security/certificates into
  hermione:/etc/grid-security/certificates
  But, it didn't work..
  These are output of harry and hermione.
 
 
 
 ##
  Harry: root
 
 
 ##
 
  *[r...@harry grid-security]#
 $GLOBUS_LOCATION/bin/grid-cert-diagnostics*
  Checking Environment Variables
  ==
  Checking if HOME is set... /root
  Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1
  Checking if X509_CERT_DIR is set... no
  Checking if X509_USER_CERT is set... no
  Checking if X509_USER_KEY is set... no
  Checking if X509_USER_PROXY is set... no
  Checking if GRIDMAP is set... no
 
  Checking Security Directories
  ===
  Determining trusted cert path... /etc/grid-security/certificates
  Checking for cog.properties... not found
  Checking for default gridmap location...
 /etc/grid-security/grid-mapfile
  Checking if default gridmap exists... yes
 
  Checking trusted certificates...
  
  Getting trusted certificate list...
  Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok
  Checking that certificate hash matches filename... ok
  Checking CA certificate name for 45fb3f91.0...ok
 
 (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus
 http://simpleca-harry.sookmyung.ac.kr/CN=Globus
  http://simpleCA-harry.sookmyung.ac.kr/CN=Globus
 http://simpleca-harry.sookmyung.ac.kr/CN=Globus Simple CA)
  Checking if signing policy exists for 45fb3f91.0... ok
  Verifying certificate chain for 45fb3f91.0... ok
 
 
 
 ##
  Harry: user (the user name is /aero/):
 
 
 ##
 
  *[a...@harry grid-security]$
  $GLOBUS_LOCATION/bin/grid-cert-diagnostics*
  Checking Environment Variables

Re: [gt-user] error :gridftp, globus-url-copy

2010-04-15 Thread Martin Feller

Sunah,

Can you send /etc/grid-security/certificates/45fb3f91.0 from both machines to me
so that I can try it myself?
If I knew another way to solve the problem I'd tell you.
Maybe someone from the GridFTP or C security side has more ideas.

Martin

Sunah Park wrote:
 Martin,
  
 Thanks for your help.
 I built it from sources on both 2 machines..
 and I checked the openssl version of 2 machines are same.
  
 # 
 [glo...@harry ~]$ openssl version
 OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
 #
 [glo...@hermione ~]$ openssl version
 OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
 #
  
 And  /etc/grid-security/certificates/45fb3f91.0 are also the same on
 harry and hermione.
 It's too difficult to catch the problems.. 
 Is there another way to solve the problem?
  
 Sunah Park.
  
  
 
 2010/4/14 Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov
 
 Sunah Park,
 
 Hm, ok. How did you install the GT on these 2 machines: Did you build it
 from sources or did you use binary installers?
 If you built it from binary installers I wonder if maybe the openssl
 version on hermione is not compatible. What are the openssl versions
 on these 2 machines?
 
 I remember one case where the installation of a binary installer
 worked fine,
 the gridftp server started ok, but transfers failed with security
 related errors,
 due to an incompatible openssl version.
 
 For sanity: Can you double-check that
 /etc/grid-security/certificates/45fb3f91.0
 are really the same on harry and hermione?
 
 Martin
 
 박선아 wrote:
  Hi~ Martin,
  I'm Cinyoung's coworker and I saw your mails you sent her to solve the
  problems.
  Then I did the following works  written in your email:
  * Put all grid security stuff into /etc/grid-security on both
 machines
  * Unset all globus security related environment variables on both
 machines for all users
  * The content of harry:/etc/grid-security/certificates seems
 ok, at
  least
grid-cert-diagnostics does not segfault. Copy the content of
harry:/etc/grid-security/certificates into
  hermione:/etc/grid-security/certificates
  But, it didn't work..
  These are output of harry and hermione.
 
 
 ##
  Harry: root
 
 ##
 
  *[r...@harry grid-security]#
 $GLOBUS_LOCATION/bin/grid-cert-diagnostics*
  Checking Environment Variables
  ==
  Checking if HOME is set... /root
  Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1
  Checking if X509_CERT_DIR is set... no
  Checking if X509_USER_CERT is set... no
  Checking if X509_USER_KEY is set... no
  Checking if X509_USER_PROXY is set... no
  Checking if GRIDMAP is set... no
 
  Checking Security Directories
  ===
  Determining trusted cert path... /etc/grid-security/certificates
  Checking for cog.properties... not found
  Checking for default gridmap location...
 /etc/grid-security/grid-mapfile
  Checking if default gridmap exists... yes
 
  Checking trusted certificates...
  
  Getting trusted certificate list...
  Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok
  Checking that certificate hash matches filename... ok
  Checking CA certificate name for 45fb3f91.0...ok
 
 (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus
 http://simpleca-harry.sookmyung.ac.kr/CN=Globus
  http://simpleCA-harry.sookmyung.ac.kr/CN=Globus
 http://simpleca-harry.sookmyung.ac.kr/CN=Globus Simple CA)
  Checking if signing policy exists for 45fb3f91.0... ok
  Verifying certificate chain for 45fb3f91.0... ok
 
 
 ##
  Harry: user (the user name is /aero/):
 
 ##
 
  *[a...@harry grid-security]$
  $GLOBUS_LOCATION/bin/grid-cert-diagnostics*
  Checking Environment Variables
  ==
  Checking if HOME is set... /home/aero
  Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1
  Checking if X509_CERT_DIR is set... no
  Checking if X509_USER_CERT is set... no
  Checking if X509_USER_KEY is set... no
  Checking if X509_USER_PROXY is set... no
  Checking

Re: [gt-user] error :gridftp, globus-url-copy

2010-04-14 Thread Martin Feller

Sunah Park,

Hm, ok. How did you install the GT on these 2 machines: Did you build it
from sources or did you use binary installers?
If you built it from binary installers I wonder if maybe the openssl
version on hermione is not compatible. What are the openssl versions
on these 2 machines?

I remember one case where the installation of a binary installer worked fine,
the gridftp server started ok, but transfers failed with security related 
errors,
due to an incompatible openssl version.

For sanity: Can you double-check that /etc/grid-security/certificates/45fb3f91.0
are really the same on harry and hermione?

Martin

박선아 wrote:
 Hi~ Martin,
 I'm Cinyoung's coworker and I saw your mails you sent her to solve the
 problems.
 Then I did the following works  written in your email:
 * Put all grid security stuff into /etc/grid-security on both machines
 * Unset all globus security related environment variables on both
machines for all users
 * The content of harry:/etc/grid-security/certificates seems ok, at
 least
   grid-cert-diagnostics does not segfault. Copy the content of
   harry:/etc/grid-security/certificates into
 hermione:/etc/grid-security/certificates
 But, it didn't work..
 These are output of harry and hermione.
  
 ## 
 Harry: root
 ##
 
 *[r...@harry grid-security]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics*
 Checking Environment Variables
 ==
 Checking if HOME is set... /root
 Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1
 Checking if X509_CERT_DIR is set... no
 Checking if X509_USER_CERT is set... no
 Checking if X509_USER_KEY is set... no
 Checking if X509_USER_PROXY is set... no
 Checking if GRIDMAP is set... no
  
 Checking Security Directories
 ===
 Determining trusted cert path... /etc/grid-security/certificates
 Checking for cog.properties... not found
 Checking for default gridmap location... /etc/grid-security/grid-mapfile
 Checking if default gridmap exists... yes
  
 Checking trusted certificates...
 
 Getting trusted certificate list...
 Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok
 Checking that certificate hash matches filename... ok
 Checking CA certificate name for 45fb3f91.0...ok
 (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus
 http://simpleCA-harry.sookmyung.ac.kr/CN=Globus Simple CA)
 Checking if signing policy exists for 45fb3f91.0... ok
 Verifying certificate chain for 45fb3f91.0... ok
 
 ## 
 Harry: user (the user name is /aero/):
 ## 
 
 *[a...@harry grid-security]$
 $GLOBUS_LOCATION/bin/grid-cert-diagnostics*
 Checking Environment Variables
 ==
 Checking if HOME is set... /home/aero
 Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1
 Checking if X509_CERT_DIR is set... no
 Checking if X509_USER_CERT is set... no
 Checking if X509_USER_KEY is set... no
 Checking if X509_USER_PROXY is set... no
 Checking if GRIDMAP is set... no
 Checking Security Directories
 ===
 Determining trusted cert path... /etc/grid-security/certificates
 Checking for cog.properties... not found
 Checking for default gridmap location... /home/aero/.gridmap
 Checking if default gridmap exists... failed
 globus_sysconfig: File does not exist: /home/aero/.gridmap is
 not a valid file
 Checking trusted certificates...
 
 Getting trusted certificate list...
 Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok
 Checking that certificate hash matches filename... ok
 Checking CA certificate name for 45fb3f91.0...ok
 (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus
 http://simpleCA-harry.sookmyung.ac.kr/CN=Globus Simple CA)
 Checking if signing policy exists for 45fb3f91.0... ok
 Verifying certificate chain for 45fb3f91.0... ok
  
 
   
 ## 
 Hermione: root:
 ## 
 
 * [r...@hermione share]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics *
 
 Checking Environment Variables
 ==
 Checking if HOME is set... /root
 Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1
 Checking if X509_CERT_DIR is set... no
 Checking if X509_USER_CERT is set... no
 Checking if X509_USER_KEY is set... no
 Checking

Re: [gt-user] error :gridftp, globus-url-copy

2010-04-09 Thread Martin Feller

And what's the output of grid-cert-diagnostics on hermione?

Martin

cinyoung hur wrote:
 
 
 Martin,
 
 I run the command $GLOBUS_LOCATION/bin/grid-cert-diagnostic.
 if X509_CERT_DIR is not set, did it cause problem?
 
 Thanks.
 
 Regards,
 Cinyoung Hur.
 
 [r...@harry ~]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics
 Checking Environment Variables
 ==
 Checking if HOME is set... /root
 Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1
 Checking if X509_CERT_DIR is set... no
 Checking if X509_USER_CERT is set... no
 Checking if X509_USER_KEY is set... no
 Checking if X509_USER_PROXY is set... no
 Checking if GRIDMAP is set... no
 
 Checking Security Directories
 ===
 Determining trusted cert path...
 /usr/local/globus-4.2.1.1/share/certificates
 Checking for cog.properties... not found
 Checking for default gridmap location... /etc/grid-security/grid-mapfile
 Checking if default gridmap exists... yes
 
 Checking trusted certificates...
 
 Getting trusted certificate list...
 Checking CA file
 /usr/local/globus-4.2.1.1/share/certificates/45fb3f91.0... ok
 Checking that certificate hash matches filename... ok
 Checking CA certificate name for 45fb3f91.0...ok
 (/O=Grid/OU=GlobusTest/OU=simpleCA-harry..xx.xx/CN=Globus Simple CA)
 Checking if signing policy exists for 45fb3f91.0... ok
 Verifying certificate chain for 45fb3f91.0... ok
 
 
 2010/4/9 Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov
 
 Cinyoung,
 
 In case that didn't help resolve the issue, you might want to run
 the command
 $GLOBUS_LOCATION/bin/grid-cert-diagnostics, which prints pretty helpful
 information about the grid security setup on a machine.
 Maybe that helps finding the golden snitch... ;)
 
 Martin
 
 Lukasz Lacinski wrote:
  Do you have in the directory
 hermione:/etc/grid-security/certificates a certificate of the
 Certificate Authority you used to obtain your user certificate?
 Please compare /etc/grid-security/certificates on hermione and
 harry. I looks like you can transfer files between harry and your
 local machine (file:///path_to_a_file), and only hermione makes
 problems.
 
  Regards,
  Lukasz
 
 
  On Apr 8, 2010, at 8:22 AM, cinyoung hur wrote:
 
  Hello, list.
 
 
  I'm trying to make gridftp work on two nodes, called Hermione and
 Harry
 
 
  I read other problems in mailing list, someone pointed out clock
 skew.
  so, I solved clock skew problems.
 
  However, I don't know what my problem is.
 
  Could anyone help me with this problem, please?
 
  Thank you.
 
  Cheers,
  Cinyoung Hur.
 
  -
  [a...@hermione ~]$ globus-url-copy -dbg
 gsiftp://hermione..xx.xx/etc/group
 gsiftp://harry..xx.xx/tmp/from-a
  debug: starting to size gsiftp://hermione..xx.xx/etc/group
  debug: connecting to gsiftp://hermione..xx.xx/etc/group
  debug: response from gsiftp://hermione..xx.xx/etc/group:
  220 hermione..xx.xx GridFTP Server 3.15 (gcc32,
 1222656151-78) [Globus Toolkit 4.2.1] ready.
 
  debug: authenticating with gsiftp://hermione..xx.xx/etc/group
  debug: response from gsiftp://hermione..xx.xx/etc/group:
  530-globus_xio: Authentication Error
  530-OpenSSL Error: s3_srvr.c:2490: in library: SSL routines,
 function SSL3_GET_CLIENT_CERTIFICATE: no certificate returned
  530-globus_gsi_callback_module: Could not verify credential
  530-globus_gsi_callback_module: Could not verify credential:
 invalid CA certificate
  530 End.
 
  debug: fault on connection to gsiftp://hermione..xx.xx/etc/group
  debug: operation complete
  debug: starting to transfer
 gsiftp://hermione..xx.xx/etc/group to
 gsiftp://harry..xx.xx/tmp/from-a
  debug: connecting to gsiftp://harry..xx.xx/tmp/from-a
  debug: response from gsiftp://harry..xx.xx/tmp/from-a:
  220 harry..xx.xx GridFTP Server 3.15 (gcc32dbgpthr,
 1222656151-78) [Globus Toolkit 4.2.1] ready.
 
  debug: authenticating with gsiftp://harry..xx.xx/tmp/from-a
  debug: response from gsiftp://harry..xx.xx/tmp/from-a:
  230 User aero logged in.
 
  debug: sending command to gsiftp://harry..xx.xx/tmp/from-a:
  SITE HELP
 
  debug: response from gsiftp://harry..xx.xx/tmp/from-a:
  214-The following commands are recognized:
  ALLOAPPERESTCWD CDUPDCAUEPSVFEAT
  ERETMDTMSTATESTOHELPLISTMODENLST
  MLSDPASVRNFRMLSTNOOPOPTSSTORPASS
  PBSZPORTPROTSITEEPRTRETRSPORSCKS
  TREVPWD QUITSBUFSIZE

Re: [gt-user] error :gridftp, globus-url-copy

2010-04-09 Thread Martin Feller

 /root/.globus/certificates/45fb3f91.0... ok
 Checking that certificate hash matches filename... ok
 Checking CA certificate name for 45fb3f91.0...ok
 (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.xxx.xx.xx/CN=Globus Simple CA)
 Checking if signing policy exists for 45fb3f91.0... ok
 Verifying certificate chain for 45fb3f91.0... ok
 [r...@harry myproxy]# exit
 logout

Harry: user:
-

 [a...@harry globus]$ $GLOBUS_LOCATION/bin/grid-cert-diagnostics
 Checking Environment Variables
 ==
 Checking if HOME is set... /home/aero
 Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1
 Checking if X509_CERT_DIR is set... no
 Checking if X509_USER_CERT is set... no
 Checking if X509_USER_KEY is set... no
 Checking if X509_USER_PROXY is set... no
 Checking if GRIDMAP is set... no
 
 Checking Security Directories
 ===
 Determining trusted cert path... /etc/grid-security/certificates
 Checking for cog.properties... not found
 Checking for default gridmap location... /home/aero/.gridmap
 Checking if default gridmap exists... failed
globus_sysconfig: File does not exist: /home/aero/.gridmap is not a
 valid file
 
 
 Checking trusted certificates...
 
 Getting trusted certificate list...
 Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok
 Checking that certificate hash matches filename... ok
 Checking CA certificate name for 45fb3f91.0...ok
 (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus
 http://simpleCA-harry.sookmyung.ac.kr/CN=Globus Simple CA)
 Checking if signing policy exists for 45fb3f91.0... ok
 Verifying certificate chain for 45fb3f91.0... ok
 [a...@harry globus]$
 
 2010/4/9 Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov
 
 And what's the output of grid-cert-diagnostics on hermione?
 
 Martin
 
 cinyoung hur wrote:
 
 
  Martin,
 
  I run the command $GLOBUS_LOCATION/bin/grid-cert-diagnostic.
  if X509_CERT_DIR is not set, did it cause problem?
 
  Thanks.
 
  Regards,
  Cinyoung Hur.
 
  [r...@harry ~]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics
  Checking Environment Variables
  ==
  Checking if HOME is set... /root
  Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1
  Checking if X509_CERT_DIR is set... no
  Checking if X509_USER_CERT is set... no
  Checking if X509_USER_KEY is set... no
  Checking if X509_USER_PROXY is set... no
  Checking if GRIDMAP is set... no
 
  Checking Security Directories
  ===
  Determining trusted cert path...
  /usr/local/globus-4.2.1.1/share/certificates
  Checking for cog.properties... not found
  Checking for default gridmap location...
 /etc/grid-security/grid-mapfile
  Checking if default gridmap exists... yes
 
  Checking trusted certificates...
  
  Getting trusted certificate list...
  Checking CA file
  /usr/local/globus-4.2.1.1/share/certificates/45fb3f91.0... ok
  Checking that certificate hash matches filename... ok
  Checking CA certificate name for 45fb3f91.0...ok
  (/O=Grid/OU=GlobusTest/OU=simpleCA-harry..xx.xx/CN=Globus
 Simple CA)
  Checking if signing policy exists for 45fb3f91.0... ok
  Verifying certificate chain for 45fb3f91.0... ok
 
 
  2010/4/9 Martin Feller fel...@mcs.anl.gov
 mailto:fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov
 mailto:fel...@mcs.anl.gov
 
  Cinyoung,
 
  In case that didn't help resolve the issue, you might want to run
  the command
  $GLOBUS_LOCATION/bin/grid-cert-diagnostics, which prints
 pretty helpful
  information about the grid security setup on a machine.
  Maybe that helps finding the golden snitch... ;)
 
  Martin
 
  Lukasz Lacinski wrote:
   Do you have in the directory
  hermione:/etc/grid-security/certificates a certificate of the
  Certificate Authority you used to obtain your user certificate?
  Please compare /etc/grid-security/certificates on hermione and
  harry. I looks like you can transfer files between harry and your
  local machine (file:///path_to_a_file), and only hermione makes
  problems.
  
   Regards,
   Lukasz
  
  
   On Apr 8, 2010, at 8:22 AM, cinyoung hur wrote:
  
   Hello, list.
  
  
   I'm trying to make gridftp work on two nodes, called
 Hermione and
  Harry
  
  
   I read other problems in mailing list, someone pointed out
 clock
  skew.
   so, I solved clock skew problems.
  
   However, I don't know what my problem is.
  
   Could anyone help me

Re: [gt-user] Gridmap PDP for service

2010-04-09 Thread Martin Feller

Just a guess: The interceptor element looks a bit different in the docs on
http://www.globus.org/toolkit/docs/4.2/4.2.1/security/wsaajava/descriptor/

(It's a containerSecurityDescriptor and not a serviceSecurityDescriptor, but 
still...)

Does interceptor name=gridmapAuthz:org.globus.wsrf.impl.security.GridMapPDP
instead of interceptor name=gridmap make it work?

-Martin

Johannes Duschl wrote:
 Hello,
 
 I'm running gt-4.2.1.1 on Debian Lenny and want to use a separate
 gridmap-file for a service. The security descriptor looks like this:
 
 serviceSecurityConfig
 xmlns=http://www.globus.org/security/descriptor/service;
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
 xsi:schemaLocation=http://www.globus.org/security/descriptor
 name_value_type.xsd
 xmlns:param=http://www.globus.org/security/descriptor;
 
   auth-method
   GSISecureConversation/
   /auth-method
 
   authzChain
   pdps
   interceptor name=gridmap
  parameter
  param:nameValueParam
 param:parameter
 name=gridmap-file
 
 value=/home/globus/grid-mapfile/
  /param:nameValueParam
   /parameter
   /interceptor
   /pdps
   /authzChain
 
 /serviceSecurityConfig
 
 However, when I connect to the service I get the following error:
 
 org.globus.wsrf.ResourceContextException: ; nested exception
 is: 
   javax.naming.NamingException: [JWSCORE-203] Bean security
 initialization failed [Root exception is
 org.globus.wsrf.config.ConfigException: [JWSSEC-245] Error
 parsing file:
 etc/at_jku_tk_service_core/service-instance-security.xml [Caused 
 by: cvc-complex-type.2.4.c: The matching wildcard is strict, but no 
 declaration can be found for element 'param:nameValueParam'.]]
 Exception in thread main java.lang.NullPointerException
 
 I assume there is something wrong with this schema
 
 xsi:schemaLocation=http://www.globus.org/security/descriptor
 name_value_type.xsd
 
 but I have no idea what's causing the error. Anybody got a clue?
 
 Greetings,
 Johannes

Re: [gt-user] error :gridftp, globus-url-copy

2010-04-08 Thread Martin Feller

Cinyoung,

In case that didn't help resolve the issue, you might want to run the command
$GLOBUS_LOCATION/bin/grid-cert-diagnostics, which prints pretty helpful
information about the grid security setup on a machine.
Maybe that helps finding the golden snitch... ;)

Martin

Lukasz Lacinski wrote:
 Do you have in the directory hermione:/etc/grid-security/certificates a 
 certificate of the Certificate Authority you used to obtain your user 
 certificate? Please compare /etc/grid-security/certificates on hermione and 
 harry. I looks like you can transfer files between harry and your local 
 machine (file:///path_to_a_file), and only hermione makes problems.
 
 Regards,
 Lukasz
 
 
 On Apr 8, 2010, at 8:22 AM, cinyoung hur wrote:
 
 Hello, list.


 I'm trying to make gridftp work on two nodes, called Hermione and Harry


 I read other problems in mailing list, someone pointed out clock skew.
 so, I solved clock skew problems.

 However, I don't know what my problem is.

 Could anyone help me with this problem, please?

 Thank you.

 Cheers,
 Cinyoung Hur.

 -
 [a...@hermione ~]$ globus-url-copy -dbg 
 gsiftp://hermione..xx.xx/etc/group gsiftp://harry..xx.xx/tmp/from-a
 debug: starting to size gsiftp://hermione..xx.xx/etc/group
 debug: connecting to gsiftp://hermione..xx.xx/etc/group
 debug: response from gsiftp://hermione..xx.xx/etc/group:
 220 hermione..xx.xx GridFTP Server 3.15 (gcc32, 1222656151-78) [Globus 
 Toolkit 4.2.1] ready.

 debug: authenticating with gsiftp://hermione..xx.xx/etc/group
 debug: response from gsiftp://hermione..xx.xx/etc/group:
 530-globus_xio: Authentication Error
 530-OpenSSL Error: s3_srvr.c:2490: in library: SSL routines, function 
 SSL3_GET_CLIENT_CERTIFICATE: no certificate returned
 530-globus_gsi_callback_module: Could not verify credential
 530-globus_gsi_callback_module: Could not verify credential: invalid CA 
 certificate
 530 End.

 debug: fault on connection to gsiftp://hermione..xx.xx/etc/group
 debug: operation complete
 debug: starting to transfer gsiftp://hermione..xx.xx/etc/group to 
 gsiftp://harry..xx.xx/tmp/from-a
 debug: connecting to gsiftp://harry..xx.xx/tmp/from-a
 debug: response from gsiftp://harry..xx.xx/tmp/from-a:
 220 harry..xx.xx GridFTP Server 3.15 (gcc32dbgpthr, 1222656151-78) 
 [Globus Toolkit 4.2.1] ready.

 debug: authenticating with gsiftp://harry..xx.xx/tmp/from-a
 debug: response from gsiftp://harry..xx.xx/tmp/from-a:
 230 User aero logged in.

 debug: sending command to gsiftp://harry..xx.xx/tmp/from-a:
 SITE HELP

 debug: response from gsiftp://harry..xx.xx/tmp/from-a:
 214-The following commands are recognized:
 ALLOAPPERESTCWD CDUPDCAUEPSVFEAT
 ERETMDTMSTATESTOHELPLISTMODENLST
 MLSDPASVRNFRMLSTNOOPOPTSSTORPASS
 PBSZPORTPROTSITEEPRTRETRSPORSCKS
 TREVPWD QUITSBUFSIZESPASSTRUSYST
 RNTOTYPEUSERLANGMKD RMD DELECKSM
 214 End

 debug: sending command to gsiftp://harry..xx.xx/tmp/from-a:
 FEAT

 debug: response from gsiftp://harry..xx.xx/tmp/from-a:
 211-Extensions supported
  AUTHZ_ASSERT
  UTF8
  LANG EN
  DCAU
  PARALLEL
  SIZE
  MLST 
 Type*;Size*;Modify*;Perm*;Charset;UNIX.mode*;UNIX.owner*;UNIX.group*;Unique*;UNIX.slink*;
  ERET
  ESTO
  SPAS
  SPOR
  REST STREAM
  MDTM
  PASV AllowDelayed;
 211 End.

 debug: sending command to gsiftp://harry..xx.xx/tmp/from-a:
 TYPE I
 debug: response from gsiftp://harry..xx.xx/tmp/from-a:
 200 Type set to I.

 debug: sending command to gsiftp://harry..xx.xx/tmp/from-a:
 PBSZ 1048576

 debug: response from gsiftp://harry..xx.xx/tmp/from-a:
 200 PBSZ=1048576

 debug: sending command to gsiftp://harry..xx.xx/tmp/from-a:
 PASV

 debug: response from gsiftp://harry..xx.xx/tmp/from-a:
 227 Entering Passive Mode (203,153,146,56,137,160)

 debug: sending command to gsiftp://harry..xx.xx/tmp/from-a:
 STOR /tmp/from-a

 debug: sending command to gsiftp://hermione..xx.xx/etc/group:
 TYPE I
 debug: response from gsiftp://hermione..xx.xx/etc/group:
 530 Must perform GSSAPI authentication.

 debug: fault on connection to gsiftp://hermione..xx.xx/etc/group
 debug: operation complete

 error: globus_ftp_client: the server responded with an error
 530 Must perform GSSAPI authentication.

 [a...@hermione ~]$
 -

Re: [gt-user] Help regarding process ID..

2010-03-29 Thread Martin Feller

Hi,

You can only get the process id of your job with ws-gram from GT 4.2.x, but 
with the GT 4.0 series.
We added this parameter as a resource property of a job resource in GT 4.2.0.
Example of how to get the local job id:
http://www-unix.globus.org/toolkit/docs/latest-stable/execution/gram4/user/#gram4-user-query-single

I don't know about Gram 5.0 at the moment.

Martin

siddharth jain wrote:
 Hello,
 
 I'm able to submit a job to any resource of my choice using the
 globusrun-ws command. I need to know the process ID of the process
 executing on the resource for this Job. How can I get this information?
 Can I get this information at the time of Job submission?
 Thank you.
 
 Yours sincerely,
 Siddharth Jain

Re: [gt-user] Help regarding process ID..

2010-03-29 Thread Martin Feller

I meant to say:

You can only get the process id of your job with ws-gram from GT 4.2.x, but NOT 
with the GT 4.0 series.
...

Martin Feller wrote:
 Hi,
 
 You can only get the process id of your job with ws-gram from GT 4.2.x, but 
 with the GT 4.0 series.
 We added this parameter as a resource property of a job resource in GT 4.2.0.
 Example of how to get the local job id:
 http://www-unix.globus.org/toolkit/docs/latest-stable/execution/gram4/user/#gram4-user-query-single
 
 I don't know about Gram 5.0 at the moment.
 
 Martin
 
 siddharth jain wrote:
 Hello,

 I'm able to submit a job to any resource of my choice using the
 globusrun-ws command. I need to know the process ID of the process
 executing on the resource for this Job. How can I get this information?
 Can I get this information at the time of Job submission?
 Thank you.

 Yours sincerely,
 Siddharth Jain

Re: [gt-user] Doubt about job submition

2010-03-29 Thread Martin Feller

Lucio,

What if you submit the job in batch mode like

globusrun-ws -submit -b -o myJob.epr -f job_medidas_sonares.xml

and poll for status via

globusrun-ws -status -j myJob.epr

instead of relying on notification messages.
Do you get information about the status of your job then?

If so: Could there be a firewall on the client-side that prevents
the notifications being sent by the server from reaching the client?

Martin

Lucio Agostinho Rocha wrote:
 Hi,
 
 I'm using GT4.0.8. I'm trying to submit a job in LOCAL_IP, and receive
 the response in REMOTE_IP. To do this, I create the following job:
 
 job
factoryEndpoint
   xmlns:gram=http://www.globus.org/namespaces/2004/10/gram/job;
   xmlns:wsa=http://schemas.xmlsoap.org/ws/2004/03/addressing;
   wsa:Address
  https://LOCAL_IP:8443/wsrf/services/ManagedJobFactoryService
   /wsa:Address
   wsa:ReferenceProperties
  gram:ResourceIDFork/gram:ResourceID
   /wsa:ReferenceProperties
/factoryEndpoint
 
jobCredentialEndpoint xsi:type=ns1:EndpointReferenceType

 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;

 xmlns:ns1=http://schemas.xmlsoap.org/ws/2004/03/addressing;
   ns1:Address xsi:type=ns1:AttributedURI
  https://LOCAL_IP:8443/wsrf/services/DelegationService
   /ns1:Address
   ns1:ReferenceProperties xsi:type=ns1:ReferencePropertiesType
  ns1:DelegationKey
xmlns:ns1=http://www.globus.org/08/2004/delegationService;
   1de27040-0649-11db-a7d0-8fccdff3a60c
  /ns1:DelegationKey
   /ns1:ReferenceProperties
   ns1:ReferenceParameters xsi:type=ns1:ReferenceParametersType/
 /jobCredentialEndpoint
 
 stagingCredentialEndpoint xsi:type=ns1:EndpointReferenceType
 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
 
 xmlns:ns1=http://schemas.xmlsoap.org/ws/2004/03/addressing;
ns1:Address xsi:type=ns1:AttributedURI
  https://LOCAL_IP:8443/wsrf/services/DelegationService
/ns1:Address
ns1:ReferenceProperties xsi:type=ns1:ReferencePropertiesType
 ns1:DelegationKey
  xmlns:ns1=http://www.globus.org/08/2004/delegationService;
 1de27040-0649-11db-a7d0-8fccdff3a60c
   /ns1:DelegationKey
/ns1:ReferenceProperties
ns1:ReferenceParameters xsi:type=ns1:ReferenceParametersType/
 /stagingCredentialEndpoint
 
 executableglobus_MedidasSonares/executable

 directory/usr/local/HttpIpthru/API_HttpIpthru_01_03_2010/C++-API/src/directory
 argumentSID_/argument
 argumenthttp://127.0.0.1:4951/argument
 stdout/tmp/stdout/stdout
 
 
 fileStageIn
 
transferCredentialEndpoint xsi:type=ns1:EndpointReferenceType
   xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
   xmlns:ns1=http://schemas.xmlsoap.org/ws/2004/03/addressing;
   ns1:Address xsi:type=ns1:AttributedURI
  https://LOCAL_IP:8443/wsrf/services/DelegationService
   /ns1:Address
   ns1:ReferenceProperties xsi:type=ns1:ReferencePropertiesType
  ns1:DelegationKey
 xmlns:ns1=http://www.globus.org/08/2004/delegationService;
1de27040-0649-11db-a7d0-8fccdff3a60c
  /ns1:DelegationKey
   /ns1:ReferenceProperties
   ns1:ReferenceParameters xsi:type=ns1:ReferenceParametersType/
   /transferCredentialEndpoint
 transfer
 sourceUrlgsiftp://LOCAL_IP/tmp/stdout/sourceUrl

 destinationUrlgsiftp://REMOTE_IP/home/lucio/processed_job.txt/destinationUrl
 /transfer
 
 /fileStageIn
 
 /job
 
 Then I execute:
 
 $ globusrun-ws -submit -f job_medidas_sonares.xml
 Submitting job...Done.
 Job ID: uuid:57d6ebf2-3b45-11df-8e50-001c23c0ceff
 Termination time: 03/30/2010 15:11 GMT
 
 After some time, the message is showed:
 
 Current job state: Unsubmitted
 
 But no processing is showed after this. What I'm doing wrong? I copied
 the DelegationKey from a forum, this can be the problem? Some suggestions?
 
 Thanks in advance,
 
 Lucio
 ..
 
 
 
 Veja quais são os assuntos do momento no Yahoo! + Buscados: Top 10
 http://br.rd.yahoo.com/mail/taglines/mail/*http://br.maisbuscados.yahoo.com/
 - Celebridades
 http://br.rd.yahoo.com/mail/taglines/mail/*http://br.maisbuscados.yahoo.com/celebridades/
 - Música
 http://br.rd.yahoo.com/mail/taglines/mail/*http://br.maisbuscados.yahoo.com/m%C3%BAsica/
 - Esportes
 http://br.rd.yahoo.com/mail/taglines/mail/*http://br.maisbuscados.yahoo.com/esportes/

[gt-user] test

2010-03-29 Thread Martin Feller

please ignore

Re: [gt-user] job unsubmitted problem

2010-03-24 Thread Martin Feller

Marco,

No, this message is unrelated to the problem below.

What this message indicates is, that the corresponding resource of a
job is reloaded in the GT4 container at startup time.
The information about a job is not only held in memory, but also
persisted to disk. In case of a container shutdown, the in-memory
resource goes away, but when you restart it, it'll be reloaded from
the persisted data.
This ensures that a job is still manageable for a user after a
GT4 container restart, like query job status, delete the job.

A fix for the problem described below is not in 4.0.8.
If you think you run into it, apply the fix as described.

Martin

Marco Lackovic wrote:
 Hello,
 
 when starting the container I sometimes get the following message:
 
 2010-03-24 11:44:11,609 INFO  exec.ManagedExecutableJobHome
 [Thread-2,recover:207] Recovered resource with ID
 3cb225a0-36b6-11df-a43d-bc4c724692c3.
 
 I am wondering whether it is related with the following problem and
 whether it has been then fixed in the version 4.0.8.
 
 
 On Wed, Aug 27, 2008 at 2:55 PM, Martin Feller fel...@mcs.anl.gov wrote:
 Ok, i think i see it now. You are hitting a combination of generous
 locking and a potential for an infinite loop in which your container
 happily cycles.

 This situation can happen if your job wants to fetch a non-existing
 credential (probably destroyed earlier) from the delegation service
 and then, because the credentail does not exist anymore, tries to
 delete the user proxy file created from that credential earlier, which
 does not exist either, because it was probably deleted when the
 credential was destroyed.
 A not completely uncommon situation i guess, and we handle that badly.

 I'll have to check how this should be fixed best. This fix should
 then also find it's ways into the VDT. I'll open a bug for that.

 A quick fix for you to go on is:
 Replace

-delete)
# proxyfile should exist
exec rm $PROXYFILE
exit $?
;;

 by

-delete)
if [ -e $PROXYFILE ]; then
exec rm $PROXYFILE
exit $?
else
exit 0
fi
;;

 in $GLOBUS_LOCATION/libexec/globus-gram-local-proxy-tool

 (A patch would have been nicer, but i don't know if our versions of
 that file are the same)

 I'm quite sure that this solves your problem. Please let me know.

Re: [gt-user] Intermittent errors with GridFTP (GT4.2.1)

2010-02-23 Thread Martin Feller

Arn wrote:
 Arn wrote:
 We've set up GridFTP (4.2.1) on several nodes across our WAN (2 sites)
 using the quickstart documentation.

 We are not seeing any issues while transferring large files but when
 we do a batch transfer (globus-url-copy) with lots of small files
 (LOSF) then we have problems.
 The debug/verbose output is the following :
 

 error transferring:
 globus_ftp_client: the server responded with an error
 500 500-Command failed. : globus_l_gfs_file_open failed.
 500-globus_xio: Unable to open file /path/to/data/losf/small0aAEq8QsYSCJ
 500-globus_xio: System error in open: Permission denied
 500-globus_xio: A system call failed: Permission denied
 500 End.

 error: There was an error with one or more transfers.
 ---

 Note, that this error is intermittent as the same transfer works sometimes.

 We would appreciate some advice info on what could be the problem and
 also how to investigate further.

 
 On Tue, Feb 23, 2010 at 7:04 AM, Martin Feller fel...@mcs.anl.gov wrote:
 There is a way to transfer a directory as a single tar-stream, like this:

 1. tar up source directory prior to transfer
 2. transfer the tar-stream
 3. untar the archive on the destination

 without manual taring/untaring on the client and the server.

 We implemented this for a community that uses GridFTP heavily for transfers
 of 42GB sized directories containing 130.000 rather small files in a nested
 directory structure.
 It works very reliable this way.
 The only downside I know is that you cannot use any of the advanced
 features of GridFTP then, like parallelism: The tar-stream transfers
 became unreliable.

 To do this you must enable the popen driver in GridFTP.
 I recommend the latest server from 5.0.0 plus a GridFTP patch.

 For the taring on the client-side you can use globus-url-copy using certain
 flags. We built on top of the jglobus Java API to get it running for Java
 clients.

 I could provide more details and instructions if you are interested in this
 approach.

 
 
 Martin thanks. It looks like a reasonable solution. I will check with
 my project lead if we can use your suggestion, but we do tend to be
 wary of using non-standard patches in our production environments.

It is not a non-standard patch. It's just that the gridftp developers
fixed a popen-driver related issue after 5.0.0 was out. It'll be in
GT 5.0.1.

 Also, we do need to use parallelism but I suppose we can think of a
 way to turn it on/off depending on the situation. Or maybe we can
 specify -pp 1 (1 stream) if a LOSF situation is encountered.
 In any case, do send me the instructions on the method you suggested.

Ok, I'll prepare some notes in a few days.

Martin

 
 Thanks
 Arn

Re: [gt-user] Default Jobmanager

2010-01-17 Thread Martin Feller

Hi,

There is no such thing as a default jobmanager in ws-gram in the 4.0 series,
even if e.g. globusrun-ws seems to pretend there is one:
If you don't specify it when you use globusrun-ws, globusrun-ws will use Fork
as job manager, and the factory endpoint in the call to ManagedJobFactoryService
will actually contain Fork as resource  key to the factory endpoint.
So from the ManagedJobFactoryService's view all job managers are the same, the
client has to specify one in all calls to the ManagedJobFactoryService. If the
client does not specify a ResourceID element in the factory endpoint, the 
request
cannot be handled.

In the 4.2 series, a default job manager can be configured on the server-side,
and if the client does not specify a job manager in the factory endpoint used
in the call to the ManagedJobFactoryService, this default job manager will be
used.

So if you wanted to backport this to the 4.0 series, you'd have to make changes
to both clients and the server.

Here's a bugzilla entry that describes the changes on the server-side. It 
doesn't
cover the multi job related code details very well though:

http://bugzilla.globus.org/globus/show_bug.cgi?id=5744

Martin


Löhnhardt, Benjamin wrote:
 Hi,
 
 is it possible to set another jobmanager than fork as default in Globus
 4.0.8? I have read here
 http://www.mail-archive.com/gt-user@lists.globus.org/msg00981.html that it is
 only possible with code changes. Are these changes documented?
 
 Best regards,
 Benjamin
 
 --
 
 Benjamin Löhnhardt
 
 UNIVERSITÄTSMEDIZIN GÖTTINGEN
 GEORG-AUGUST-UNIVERSITÄT 
 Abteilung Medizinische Informatik
 Robert-Koch-Straße 40
 37075 Göttingen
 Briefpost 37099 Göttingen
 Telefon +49-551 / 39-22842
 benjamin.loehnha...@med.uni-goettingen.de
 www.mi.med.uni-goettingen.de

Re: [gt-user] yet another Host key verification failed question

2009-12-10 Thread Martin Feller

Thanks for the feedback, Brian! I'll add a hint about this to my notes.

Martin

Brian Pratt wrote:
 OK, I finally cracked the nut.  It was indeed an ssh issue, and
 the missing piece was that the user had to be able to ssh to himself
 WITHIN THE SAME NODE (!?!).  In my case the submitting user is labkey
 - it's understood that lab...@[clientnode mailto:lab...@[clientnode]
 needs to be able to ssh to lab...@[headnode mailto:lab...@[headnode]
 but it turns out he also needs to be able to ssh to lab...@[clientnode
 mailto:lab...@[clientnode].  This seems odd to me, but that's how it
 is.  I suppose there might be a config tweak for that somewhere. 
 Anyway, I just repeated the steps for establishing ssh trust between
 lab...@clientnode mailto:lab...@clientnode and lab...@headnode
 mailto:lab...@headnode for lab...@clientnode
 mailto:lab...@clientnode and lab...@clientnode
 mailto:lab...@clientnode and it's all good.  One might have guessed
 that this trust relationship was implicit, but it isn't - you have to
 add labkey's rsa public key to ~labkey/.ssh/authorized_keys, and update
 ~labkey/.ssh/known_hosts to include our own hostname.
  
 strace -f on the client node was instrumental in figuring this out, as
 well as messing around in the perl scripts on the server node.  ssldump
 was handy, too.
  
 Thanks to Martin and Jim for the pointers.  If you're reading this in an
 effort to solve a similar problem you might be interested to see my
 scripts for configuring a simple globus+torque cluster
 on EC2 at https://hedgehog.fhcrc.org/tor/stedi/trunk/AWS_EC2 .
  
 Brian
 
 On Fri, Dec 4, 2009 at 8:42 AM, Brian Pratt brian.pr...@insilicos.com
 mailto:brian.pr...@insilicos.com wrote:
 
 Martin,
  
 Thanks for that tip and the link to some very useful notes.  I'd
 started poking around in that perl module last night and it looks
 like maybe the problem is actually to do with ssh between agents
 within the same globus node, so my ssh trust relationships are not
 yet quite as comprehensive as they need to be.  I will certainly
 post the solution here when I crack the nut.  I've found lots of
 posts out there of folks with similar sounding problems but no
 resolution, we'll try to fix that here.  Of course there are as many
 ways to go afoul as there are clusters, but we must leave bread
 crumbs where we can...
 Brian
 On Thu, Dec 3, 2009 at 7:05 PM, Martin Feller fel...@mcs.anl.gov
 mailto:fel...@mcs.anl.gov wrote:
 
 Brian,
 
 The PBS job manager module is
 $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/pbs.pm
 http://pbs.pm/
 
 I remember that I had this or a similar problem once too, but can't
 seem to find notes about it (sad, i know).
 Here's some information about the Perl code which is called by
 the Java
 pieces of ws-gram to submit the job to the local resource manager.
 
 http://www.mcs.anl.gov/~feller/Globus/technicalStuff/Gram/perl/
 
 While this does not help directly, it may help in debugging.
 If i find my notes or have a good idea I'll let you know.
 
 Martin
 
 
 
 Brian Pratt wrote:
  Good plan, thanks.  Now to figure out where that is..
 
  I'm certainly learning a lot!
 
  On Thu, Dec 3, 2009 at 2:01 PM, Jim Basney
 jbas...@ncsa.uiuc.edu mailto:jbas...@ncsa.uiuc.edu
  mailto:jbas...@ncsa.uiuc.edu mailto:jbas...@ncsa.uiuc.edu
 wrote:
 
  It's been a long time since I've debugged a problem like
 this, but the
  way I did it in the old days was to modify the Globus PBS
 glue script to
  dump what it's passing to qsub, so I could reproduce it
 manually.
 
  Brian Pratt wrote:
   Let me amend that - I do think that this is sniffing
 around the
  right tree,
   which is why I said this is in some ways more of a logging
  question.  It
   does look very much like an ssh issue, so what what I
 really need
  is to
   figure out exactly what connection parameters were in
 use for the
  failue.
   They seem to be different in some respect than those
 used in the qsub
   transactions.  What I could really use is a hint at how
 to lay
  eyes on that.
  
   Thanks,
  
   Brian
  
   On Thu, Dec 3, 2009 at 1:38 PM, Brian Pratt
  brian.pr...@insilicos.com
 mailto:brian.pr...@insilicos.com
 mailto:brian.pr...@insilicos.com
 mailto:brian.pr...@insilicos.comwrote:
  
   Hi Jim,
  
   Thanks for the reply.  Unfortunately the answer doesn't
 seem

Re: [gt-user] WS-GRAM problem

2009-11-30 Thread Martin Feller

Hi,

Did you run make install during the installation?
Please send the output of

ls -l $GLOBUS_LOCATION/lib/perl/Globus/GRAM/

Martin

jr-sim...@criticalsoftware.com wrote:
 Hi all,
 
 I'm having a problem sending jobs in a 4.0.8 Globus installation.
 
 I was following the quickstart guide available in the documentation and
 when trying the WS-GRAM installation using the following command
 globusrun-ws -submit -c /bin/true,
 I get the following error:
 
 Submitting job...Done.
 Job ID: uuid:0264967a-da23-11de-860f-ca16a2742cac
 Termination time: 11/27/2009 00:31 GMT
 Current job state: Failed
 Destroying job...Done.
 globusrun-ws: Job failed: Error code: 201
 Script stderr:
 Can't locate Globus/GRAM/JobDescription.pm in @INC (@INC contains:
 /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi
 /usr/lib/perl5/site_perl/5.8.7/i386-linux-thread-multi
 /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi
 /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
 /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7
 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5
 /usr/lib/perl5/site_perl
 /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi
 /usr/lib/perl5/vendor_perl/5.8.7/i386-linux-thread-multi
 /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi
 /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
 /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7
 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5
 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi
 /usr/lib/perl5/5.8.8 . /usr/local/globus/lib/perl) at
 /usr/local/globus/libexec/globus-job-manager-script.pl line 32.BEGIN
 failed--compilation aborted at
 /usr/local/globus/libexec/globus-job-manager-script.pl line 32.
 
 Everithing in sudoers file is apparently correct.
 
 Does anyone ever came across this error, I've done some Globus
 installations and never seen this.
 
 Can someone help me. I've already lost a day around this and not seeing
 any solution.
 
 As a note I am using scientific linux 5.3 and must user GT 4.0.8
 
 If I forgot to mention something you might find usefull, please ask. ;)
 
 Cheers
 
 Zé Rui

Re: [gt-user] GridFTP: Cannot find gridftp.conf after make gridftp

2009-11-22 Thread Martin Feller

I think it doesn't exist by default. The server just uses default
values if it doesn't exist i think.
You can create it in $GLOBUS_LOCATION/etc/ though, and populate it
with your parameters, and they should be considered when the server
is started.

Martin

Raffaele Forgione wrote:
 Hi everyone.
 I'm installing gridftp from the globus toolkit 4.0.8 .
 After running make gridftp and everything goes well i search the
 direcory $GLOBUS_LOCATION/etc and don't find gridftp.conf.
 It exists neither in /etc/grid-security. Why???
 
 
 Crea e condividi i tuoi filmati con Movie Maker
 http://www.windowslive.it/moviemaker.aspx

Re: [gt-user] JGlobus GridFTP Problems

2009-10-31 Thread Martin Feller

Steffen,

I tried it with a simple 2-party transfer using the jglobus API and I
can't reproduce what you describe. Try the attached code.
Does it show the same results for you?
Adjust the values of the variables in TwoPartyTransferMain.java.
If you don't have a MyProxy server available, replace the MyProxy code.

Compile and run:

source ${GLOBUS_LOCATION}/etc/globus-devel-env.sh
javac TwoPartyTransferMain.java
java TwoPartyTransferMain

Martin

Steffen Limmer wrote:
 Hello Martin,
 
 thanks for your answer.
 
 Does setting the type to binary (Session.TYPE_IMAGE) help?
 
 Unfortunately this doesn't help.
 
 Regards,
 Steffen
 
 Martin

 Steffen Limmer wrote:
 Hello,

 i want to transfer files with the put method of the
 org.globus.ftp.GridFTPClient class.
  
 Some of the files are self extracting tar archives that contain newline
 characters in the form of ^M. When i transfer such a file for some reason
 the ^M will be erased or replaced by \n and so the archiv becomes useless.
 I tried to copy the files locally with java and everything works fine.
 So it should not be a problem with the java file i/o. Also with
 globus-url-copy everything works as expected.
 Only with the GridFTPClient appears the problem.
 Has anybody an idea what i can do to fix this?

 Thanks in advance 
 and regards,
 Steffen 
   
 

import java.io.File;
import java.util.List;
import java.util.LinkedList;
import java.util.Map;
import java.util.HashMap;
import java.util.Vector;
import org.ietf.jgss.GSSCredential;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.globus.ftp.FileInfo;
import org.globus.ftp.GridFTPClient;
import org.globus.ftp.Session;

public class TwoPartyTransfer {

private Log log = LogFactory.getLog(TwoPartyTransfer.class);

/**
 * Download a file from a GridFTP server.
 */
public void downloadFile(
String host,
int port,
GSSCredential credential,
String serverFile,
String localFile)
throws Exception {

GridFTPClient client = null;

try {
log.debug(Creating GridFTP client);
client = this.createClient(host, port, credential);
log.debug(Downloading file  + serverFile +  from  + host);
this.downloadFile(client, serverFile, localFile);
log.debug(Downloaded file  + localFile);
} finally {
this.closeClient(client);
}
 } 

/**
 * Upload a file to a GridFTP server.
 */
public void uploadFile(
String host,
int port,
GSSCredential credential,
String serverFile,
String localFile)
throws Exception {

GridFTPClient client = null;

try {
log.debug(Creating GridFTP client);
client = this.createClient(host, port, credential);
log.debug(Uploading file  + localFile);
this.uploadFile(client, localFile, serverFile);
log.debug(Uploaded file to  + serverFile +  on  + host);
} catch (Exception e) {
log.error(Error downloading file., e);
throw e;
} finally {
this.closeClient(client);
}
 } 

/**
 * Recursively download the content of a remote directory into a local 
directory.
 * The local directory will be created if it does not exist.
 */
public void downloadDir(
String host,
int port,
GSSCredential credential,
String serverDir,
String localDir)
throws Exception {

GridFTPClient listClient = null;
GridFTPClient transferClient = null;
LinkedListString dirs = new LinkedListString();
HashMapString,String files = new HashMapString,String();

try {
log.debug(Create GridFTP client for listing);
listClient = this.createClient(host, port, credential);
log.debug(Get information of dirs and files from server);
this.createInfoFromRemote(listClient, serverDir, localDir, dirs, 
files);
log.debug(Create local directories);
this.createLocalDirs(dirs);
} finally {
this.closeClient(listClient);
}

try {
log.debug(Create GridFTP client for file transfers);
transferClient = this.createClient(host, port, credential);
log.debug(Download files from server);
this.downloadFiles(transferClient, files);
} finally {
this.closeClient(transferClient);
}
 } 

/**
 * Recursively upload the content of a local directory into a remote 
directory.
 * The remote directory will be created if it does not exist.
 */
public void uploadDir(
String host,
int port,
GSSCredential credential,
String serverDir,
String localDir)
throws Exception {

Re: [gt-user] JGlobus GridFTP Problems

2009-10-27 Thread Martin Feller

Does setting the type to binary (Session.TYPE_IMAGE) help?
Like:

...
import org.ietf.jgss.GSSCredential;
import org.globus.ftp.GridFTPClient;
import org.globus.ftp.Session;
...
GridFTPClient client = ...
client.authenticate(credential);
client.setType(Session.TYPE_IMAGE);
// do something with the client

Martin

Steffen Limmer wrote:
 Hello,
 
 i want to transfer files with the put method of the 
 org.globus.ftp.GridFTPClient class.
  
 Some of the files are self extracting tar archives that contain newline 
 characters in the form of ^M. When i transfer such a file for some reason the 
 ^M will be erased or replaced by \n and so the archiv becomes useless.
 
 I tried to copy the files locally with java and everything works fine. So it 
 should not be a problem with the java file i/o. Also with globus-url-copy 
 everything works as expected.
 Only with the GridFTPClient appears the problem.
 Has anybody an idea what i can do to fix this?
 
 Thanks in advance 
 and regards,
 Steffen

Re: [gt-user] problem in the fileStageIn

2009-10-15 Thread Martin Feller

Please send your job description, or at least the fileStageIn
element of your job description.

-Martin

globus world wrote:
 i tried telnet 192.168.12.1 57468
 
 The following error came
 
 Trying 192.168.12.1...
 telnet: connect to address 192.168.12.1: Connection refused
 telnet: Unable to connect to remote host: Connection refused
 
 when i tried  telnet hostname 8443  it's connected

 
 
 On Wed, Oct 14, 2009 at 7:51 PM, Martin Feller fel...@mcs.anl.gov wrote:
 
 Hi,

 There could be several reasons for the connection problem I think:

 Is there a gridftp server up and running at 192.168.12.1:57468?
 You can test that e.g. by telnet 192.168.12.1 57468

 192.168 is a private network. Is the GT server you submit the job
 to located in the same private network and can access the gridftp
 server on 192.168.12.1:57468?

 Martin

 globus world wrote:
 HI all

 i am submitting a job to cluster txc.edu through command

 globusrun-ws -s -S -submit -f   RSLsubmit.xml


 It's giving the following error

 Delegating user credentials...Done.
 Submitting job...Done.
 Job ID: uuid:f30c0a14-b8a8-11de-a6dc-001109bc47f2
 Termination time: 10/15/2009 10:04 GMT
 Current job state: StageIn
 Current job state: Failed
 Destroying job...Done.
 Cleaning up any delegated credentials...Done.
 globusrun-ws: Job failed: Staging error for RSL element fileStageIn.
 Non-Extended 3rd-party transfer .txc.org.in:2811/home/sagar/CG_new_lin--
 txc.edu:2811/home/griduser/CG_new_lin failed [Caused by: Server refused
 performing the request. Custom message:  (error code 1) [Nested exception
 message:  Custom message: Unexpected reply: 500-Command failed. :
 callback
 failed.
 500-globus_xio: Unable to connect to 192.168.12.1:57468
 500-globus_xio: System error in connect: Connection refused
 500-globus_xio: A system call failed: Connection refused
 500 End.]]
 Non-Extended 3rd-party transfer txc.org.in:2811/home/sagar/CG_new_lin--
 txc.edu:2811/home/griduser/CG_new_lin failed [Caused by: Server refused
 performing the request. Custom message:  (error code 1) [Nested exception
 message:  Custom message: Unexpected reply: 500-Command failed. :
 callback
 failed.
 500-globus_xio: Unable to connect to 192.168.12.1:57468
 500-globus_xio: System error in connect: Connection refused
 500-globus_xio: A system call failed: Connection refused
 500 End.]]



 please help me


 Thanks and Regards
 sagar

Re: [gt-user] problem in the fileStageIn

2009-10-14 Thread Martin Feller

Hi,

There could be several reasons for the connection problem I think:

Is there a gridftp server up and running at 192.168.12.1:57468?
You can test that e.g. by telnet 192.168.12.1 57468

192.168 is a private network. Is the GT server you submit the job
to located in the same private network and can access the gridftp
server on 192.168.12.1:57468?

Martin

globus world wrote:
 HI all
 
 i am submitting a job to cluster txc.edu through command
 
 globusrun-ws -s -S -submit -f   RSLsubmit.xml
 
 
 It's giving the following error
 
 Delegating user credentials...Done.
 Submitting job...Done.
 Job ID: uuid:f30c0a14-b8a8-11de-a6dc-001109bc47f2
 Termination time: 10/15/2009 10:04 GMT
 Current job state: StageIn
 Current job state: Failed
 Destroying job...Done.
 Cleaning up any delegated credentials...Done.
 globusrun-ws: Job failed: Staging error for RSL element fileStageIn.
 Non-Extended 3rd-party transfer .txc.org.in:2811/home/sagar/CG_new_lin --
 txc.edu:2811/home/griduser/CG_new_lin failed [Caused by: Server refused
 performing the request. Custom message:  (error code 1) [Nested exception
 message:  Custom message: Unexpected reply: 500-Command failed. : callback
 failed.
 500-globus_xio: Unable to connect to 192.168.12.1:57468
 500-globus_xio: System error in connect: Connection refused
 500-globus_xio: A system call failed: Connection refused
 500 End.]]
 Non-Extended 3rd-party transfer txc.org.in:2811/home/sagar/CG_new_lin --
 txc.edu:2811/home/griduser/CG_new_lin failed [Caused by: Server refused
 performing the request. Custom message:  (error code 1) [Nested exception
 message:  Custom message: Unexpected reply: 500-Command failed. : callback
 failed.
 500-globus_xio: Unable to connect to 192.168.12.1:57468
 500-globus_xio: System error in connect: Connection refused
 500-globus_xio: A system call failed: Connection refused
 500 End.]]
 
 
 
 please help me
 
 
 Thanks and Regards
 sagar

Re: [gt-user] Regarding security configuration in globus toolkit

2009-09-14 Thread Martin Feller

Simar,

Please reply to the list, and not just to me.
Below you pasted this:

 r...@simar-laptop:~# ls ~/.globus/
 simpleCA  usercert.pem  usercert_request.pem  userkey.pem

And an ls -l on the same directory gives just simpleCA?
This would be a quite untypical output of ls -l.

So do you actually have a signed user certificate with a corresponding
private key in the .globus directory in the home of the user
who ran grid-proxy-init?

Martin

simar gill wrote:
 Hi,
 simpleCA
 root home is /home/simar
 Thanks
 
 On Mon, Sep 14, 2009 at 3:55 AM, Martin Feller fel...@mcs.anl.gov wrote:
 
 Hi,

 What's the output of ls -l ~/.globus?
 Is the home of root /home/simar?

 -Martin

 simar gill wrote:
 Hi All
 I am setting the security by using certificates and proxy.
 following error are shown:




 r...@simar-laptop:~# $GLOBUS_LOCATION/bin/grid-proxy-init -debug
 Error: Couldn't find valid credentials to generate a proxy.
grid_proxy_init.c:549: globus_sysconfig: Error with certificate
 filename: The user cert could not be found in:
 1) env. var. X509_USER_CERT
 2) $HOME/.globus/usercert.pem
 3) $HOME/.globus/usercred.p12


 r...@simar-laptop:~# ls ~/.globus/
 simpleCA  usercert.pem  usercert_request.pem  userkey.pem
 r...@simar-laptop:~# ls ~/.globus/usercert.pem
 /home/simar/.globus/usercert.pem

 r...@simar-laptop:/etc/grid-security# ls -l
 total 48
 drwxr-xr-x 2 root   root   4096 2009-09-13 13:34 certificates
 -rw-r--r-- 1 globus globus 2670 2009-09-03 15:10 containercert.pem
 -r 1 globus globus  887 2009-09-03 15:10 containerkey.pem
 lrwxrwxrwx 1 root   root 62 2009-09-13 21:34 globus-host-ssl.conf
 - /etc/grid-security/certificates//globus-host-ssl.conf.b2bc8b3f
 lrwxrwxrwx 1 root   root 62 2009-09-13 21:34 globus-user-ssl.conf
 - /etc/grid-security/certificates//globus-user-ssl.conf.b2bc8b3f
 -rw-r--r-- 1 root   root 70 2009-09-07 21:31 grid-mapfile
 -rw-r--r-- 1 root   root 16 2009-09-07 21:31 grid-mapfile.old
 lrwxrwxrwx 1 root   root 60 2009-09-13 21:34 grid-security.conf -
 /etc/grid-security/certificates//grid-security.conf.b2bc8b3f
 -rw-r--r-- 1 root   root   2670 2009-09-03 15:08 hostcert.pem
 -rw-r--r-- 1 root   root   1363 2009-09-03 15:08 hostcert_request.pem
 -rw--- 1 root   root887 2009-09-03 15:08 hostkey.pem
 -rw-r--r-- 1 root   root   2683 2009-09-13 20:05 hostsigned.pem

 please tell me the reason of these
 thanks

 regards
 Simar Virk

Re: [gt-user] GramJob Premature End Of File

2009-09-08 Thread Martin Feller

Hm, can you please try the attached (simple) client and tell if it
fails for you with the same error message, too?
It works for me with GT 4.2.1.
Replace HOST and PORT with appropriate values before you compile it.

Build and run (bash):

source $GLOBUS_LOCATION/etc/globus-devel-env.sh
javac GramClient42.java
grid-proxy-init
java -DGLOBUS_LOCATION=$GLOBUS_LOCATION GramClient42

-Martin

Mosoi Stefan wrote:
 Hello,
   I have a problem when trying to launch a gram job in Globus Toolkit 4.2.1
 using the code:
 JobDescriptionType type = new JobDescriptionType();
 type.setExecutable(/bin/echo);
 type.setArgument(new String[]{test});
 type.setDirectory(/tmp);
 type.setStdout(/home/stefan/std.out);
 type.setStderr(/home/stefan/std.err);
 type.setJobType(JobTypeEnumeration.single);
 GramJob crtjob=new GramJob(type);
 
   
 this.crtJob.setCredentials(proxy);
 this.crtJob.addListener(this);
 this.crtJob.setAuthorization(NoAuthorization.getInstance());
 
 this.crtJob.submit(factoryEPR, false, true, jobID);
 
 I get the following errors :
AxisFault
  faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
  faultSubcode:
  faultString: java.io.IOException: java.io.IOException: java.io.IOException:
 Non nillable element 'consumerReference' is null.
  faultActor:
  faultNode:
  faultDetail:
 {http://xml.apache.org/axis/}stackTrace:java.io.IOException:
 java.io.IOException: java.io.IOException: Non nillable element
 'consumerReference' is null.
 at
 org.apache.axis.encoding.ser.BeanSerializer.serialize(BeanSerializer.java:288)
 at
 org.apache.axis.encoding.SerializationContext.serializeActual(SerializationContext.java:1518)
 at
 org.apache.axis.encoding.SerializationContext.serialize(SerializationContext.java:994)
 at
 org.apache.axis.encoding.SerializationContext.serialize(SerializationContext.java:815)
 at org.apache.axis.message.RPCParam.serialize(RPCParam.java:208)
 at org.apache.axis.message.RPCElement.outputImpl(RPCElement.java:433)
 at
 org.apache.axis.message.MessageElement.output(MessageElement.java:1208)
 at org.apache.axis.message.SOAPBody.outputImpl(SOAPBody.java:139)
 at
 org.apache.axis.message.SOAPEnvelope.outputImpl(SOAPEnvelope.java:478)
 at
 org.apache.axis.message.MessageElement.output(MessageElement.java:1208)
 at org.apache.axis.SOAPPart.writeTo(SOAPPart.java:314)
 at org.apache.axis.SOAPPart.writeTo(SOAPPart.java:268)
 at org.apache.axis.Message.writeTo(Message.java:539)
 at
 org.apache.axis.transport.http.CommonsHTTPSender$MessageRequestEntity.writeRequest(CommonsHTTPSender.java:878)
 at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:495)
 at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:1973)
 at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:993)
 at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397)
 at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
 at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
 at
 org.apache.axis.transport.http.CommonsHTTPSender.invoke(CommonsHTTPSender.java:224)
 at
 org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
 at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
 at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
 at
 org.apache.axis.client.AxisClient.invokeTransport(AxisClient.java:150)
 at org.apache.axis.client.AxisClient.invoke(AxisClient.java:289)
 at org.apache.axis.client.Call.invokeEngine(Call.java:2838)
 at org.apache.axis.client.Call.invoke(Call.java:2824)
 at org.apache.axis.client.Call.invoke(Call.java:2501)
 at org.apache.axis.client.Call.invoke(Call.java:2424)
 at org.apache.axis.client.Call.invoke(Call.java:1835)
 at
 org.globus.exec.generated.bindings.ManagedJobFactoryPortTypeSOAPBindingStub.createManagedJob(ManagedJobFactoryPortTypeSOAPBindingStub.java:1644)
 at org.globus.exec.client.GramJob.createJobEndpoint(GramJob.java:1565)
 at org.globus.exec.client.GramJob.submit(GramJob.java:495)
 at jobManagement.impl.JobManager.processCrtJob(JobManager.java:161)
 at jobManagement.impl.JobManager.run(JobManager.java:103)
 AxisFault
  faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
  faultSubcode:
  faultString: org.xml.sax.SAXParseException: Premature end of file.
  faultActor:
  faultNode:
  faultDetail:
 {http://xml.apache.org/axis/}stackTrace:org.xml.sax.SAXParseException:
 Premature end of file.
 at
 org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown
 Source)
 at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown

Re: [gt-user] gridftp issues (connection refused on control channel)

2009-09-01 Thread Martin Feller

I don't know what it might be, but I remember that I had it too
in large-scale ws-gram tests. Having retries is in general a good idea.
If you want to go down to the root of the problem though, I'd
recommend sending gridftp server logs in debug mode and a detailed
description to gridftp-u...@globus.org
(https://lists.globus.org/mailman/listinfo/gridftp-user)
Maybe worth testing: Does the same happen if you push the GridFTP
servers using globus-url-copy commands with a comparable level of
concurrency?

Martin

Andre Charbonneau wrote:
 Hi,
 
 I was thinking more about this and I was wondering what could be the
 cause of the failed control channel connections we are seeing when there
 is 10 concurrent jobs?  Maybe if I can track down the source of the
 connection failures and fix this then my job throughput will be better
 since the file transfers would not need to be retried.
 
 Any thoughts about this?
 
 Thanks,
  Andre
 
 
 Martin Feller wrote:
 Hi,

 RFT has a retry mechanism for failing transfers. If you didn't specify
 a maxAttempts elements in the staging elements of your job description,
 you can try to add it and see if it helps.
 maxAttempts specifies how often RFT will try a transfer in case of
 (transient) transfer errors. It defaults to no retries.
 You can add this element to fileStageIn, fileStageOut and fileCleanUp:

 ...
 fileStageIn
 maxAttempts10/maxAttempts
 transfer
   sourceUrlgsiftp://.../sourceUrl
   destinationUrlgsiftp://.../destinationUrl
 /transfer
 /fileStageIn
 ...

 -Martin

 Andre Charbonneau wrote:
  
 Hello,
 Lately I've been running some benchmarks against a globus resource (gt
 4.0.8) here and we are noticing some rft issues when multiple jobs are
 submitted concurrently.

 The jobs are simple /bin/hostname jobs, with a small stagein and
 stageout file in order to involve rft.  The jobs are submitted
 concurrently (to the Fork factory) by a small python script, that forks
 a thread per globusrun-ws command, and then waits for all the threads to
 return.
 Everything looks ok when I submit the jobs one after the other, but when
 I submit a number of jobs concurrently (10), then I start seing some of
 the globusrun-ws commands return with an exit code of 255 and the
 following error message at the client side:

 globusrun-ws: Job failed: Staging error for RSL element fileStageOut.
 Connection creation error [Caused by: java.io.EOFException]
 Connection creation error [Caused by: java.io.EOFException]

 I could not find anything in the server side container.log.

 So I enabled debugging at the gridftp level on the server side and I
 found the following:

 2009-08-06 15:08:01,118 DEBUG vanilla.FTPControlChannel
 [Thread-47,createSocketDNSRR:153] opening control channel to
 /xxx : 2811

 (...)

 2009-08-06 15:08:01,180 DEBUG vanilla.Reply [Thread-47,init:65] read
 1st line
 2009-08-06 15:08:01,807 DEBUG vanilla.Reply [Thread-47,init:68] 1st
 line: null
 2009-08-06 15:08:01,809 DEBUG vanilla.FTPControlChannel
 [Thread-47,write:363] Control channel sending: QUIT

 2009-08-06 15:08:01,810 DEBUG vanilla.FTPControlChannel
 [Thread-47,close:260] ftp socket closed
 2009-08-06 15:08:01,812 DEBUG vanilla.FTPServerFacade
 [Thread-47,close:340] close data channels
 2009-08-06 15:08:01,813 DEBUG vanilla.FTPServerFacade
 [Thread-47,close:343] close server socket
 2009-08-06 15:08:01,813 DEBUG vanilla.FTPServerFacade
 [Thread-47,stopTaskThread:369] stop master thread
 2009-08-06 15:08:01,814 ERROR cache.ConnectionManager
 [Thread-47,createNewConnection:345] Can't create connection:
 java.io.EOFException
 2009-08-06 15:08:01,820 ERROR service.TransferWork [Thread-47,run:408]
 Transient transfer error
 Connection creation error [Caused by: java.io.EOFException]
 Connection creation error. Caused by java.io.EOFException


 I not 100% sure that these errors are related, but the Connection
 creation error. Caused by java.io.EOFException error string makes me
 think they are.  From the gridftp log above, it looks like the control
 channel connection (port 2811) back to the submit machine (probably for
 stageout step) fails.



 In order to debug this, we have tried making the gridftp connection
 limit much higher in the /etc/inetd.d/gridftp script but that didn't
 seem to help.  We have a port range of 200, which I think should be
 enough to handle 10 or so concurrent job with one stagein and 2 stageout
 elements per job.  We also experimented with that port range, but with
 no success.

 Is this something that anyone experienced before?

 Maybe there some other configuration that I can change that might fix
 this issue?

 Any help or feedback about this is much appreciated.

 Best regards,
Andre

Re: [gt-user] gridway error(donot want gridway with toolkit)

2009-09-01 Thread Martin Feller

Did you run make install?

 But when I started setting security certificates after export command
 I run command to configure ca

You need to describe more precisely what you did.
From the above one can unfortunately only guess what you did.

Martin

simar gill wrote:
 Hi
 
 I have insatlled globus toolkit4.2.1 without gridway.
 But when I started setting security certificates after export command
 I run command to configure ca.Then following error is shown:
 
 
 glo...@simar-laptop:~/gt4.2.1-all-source-installer$ cat gt-server-ca.log
 
 
 
 ERROR: Your globus install has not been setup correctly
 /home/globus/libexec/globus-script-initializer not found
 You most likely need to run gpt-postinstall for this globus install
 
 
 Due to this I can't work futher.
 Please help.
 
 Regards
 Simar Virk
 M.Tech final yr.(doing thesis work)
 Computer Sci  Tech.
 GNE,Ludhiana

Re: [gt-user] gridway error and javac is not in JAVA_HOME

2009-08-31 Thread Martin Feller

Try setting the environment variable JAVA_HOME to /usr and not to 
/path/to/java
Then retry the build.
If this does not help: Do you need gridway?

Martin

simar gill wrote:
 Hi
 m sending the javac details:
 
 
 which java
 /usr/bin/java
 
 which javac
 /usr/bin/javac
 
 echo $JAVA_HOME
 /path/to/java
 
 whereis java
 java:/usr/bin/java /etc/java  /usr/lib/java  /usr/share/java
 /usr/share/man/manl/java.l.gz
 
 whereis javac
 javac:/usr/bin/java   /usr/share/man/manl/javac.l.gz
 
 
 
 
  Regards
  Simar Virk

Re: [gt-user] FailureFileCleanUp problem

2009-08-31 Thread Martin Feller

Hi,

The problem you describe, and which is summarized in the bug you mention,
is an architectural problem in WS-GRAM in 4.0.
We fixed it in the 4.2 branch. We had to change the interface for this change
that's why we can't port it back to the 4.0 branch.
If you can upgrade to the 4.2 series I'd recommend this.

With 4.0.x there is currently no other way than:
1. Stop the container
2. Delete the problematic job from the persistence directory (by default
   ~/.globus of the user who runs the container).
   In your case: remove the file
   
~containeruser/.globus/hostname-port/ManagedExecutableJobResourceStateType/1748b3d0-8c4b-11de-8543-b8f655c16264.xml
3. Restart the container.

-Martin

Hazlewood, Victor Gene wrote:
 Hey GTers,
 
 Running WSRF v 4.0.8-r2 on a Cray XT5.  Have a user job that looks like
 it has gone into an unresolvable state and the log file is filling up
 with messages about not being able to resolve the FailureFileCleanUp
 state.   Anyone have any suggestions how to get rid of this?   Have
 looked at the documentation (nothing I found covers this), looked at
 bugzilla (http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5247 is
 close but says it will be fixed in a future release, but gives no
 instructions how to resolve it currently). I'm running out of ideas.
 
  
 
 The recurring messages are
 
  
 
 2009-08-29 12:40:02,267 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,getInternalState:1666] getting resource datum internalState
 
 2009-08-29 12:40:02,267 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,remove:285] Waiting to be Done or Failed. Current state:
 FailureFileCleanUp
 
  
 
 Any help on how to resolve this would be appreciated (besides the it is
 fixed in the next release type of resolution).
 
  
 
 Below are the complete job entries for the job.
 
  
 
 -Victor
 
  
 
  
 
 Victor Hazlewood, CISSP
 
 Senior HPC Systems Analyst
 
 National Institute for Computational Science
 
 University of Tennessee
 
 http://www.nics.tennessee.edu/ http://www.nics.utk.edu/ 
 
  
 
  
 
 Complete log file entry:
 
  
 
 2009-08-28 20:13:32,174 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,initialize:142] Entering initialize()
 
 2009-08-28 20:13:32,175 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,initialize:147] at super.initialize()
 
 2009-08-28 20:13:32,180 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,initialize:153] at initSecurity()
 
 2009-08-28 20:13:32,180 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,initSecurity:316] Entering initSecurity()
 
 2009-08-28 20:13:32,182 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,initSecurity:338] resource credential subject:
 
 2009-08-28 20:13:32,183 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,initSecurity:346] setting resource securty grid map...
 
 2009-08-28 20:13:32,183 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,initSecurity:356] Leaving initSecurity()
 
 2009-08-28 20:13:32,186 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,initVariableMap:704]
 GLOBUS_SCRATCH_DIR:${GLOBUS_USER_HOME}/.globus/scratch
 
 2009-08-28 20:13:32,370 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,resolveVariableInString:1290] resolving variables in attribute
 environment
 
 2009-08-28 20:13:32,370 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,resolveVariableInString:1295] looking at string
 ${GLOBUS_USER_HOME}
 
 2009-08-28 20:13:32,370 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,resolveVariableInString:1296] found $ at index 0
 
 2009-08-28 20:13:32,371 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,resolveVariableInString:1302] found '{'---looks like a
 reference
 
 2009-08-28 20:13:32,371 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in
 {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch,
 GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2,
 GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264,
 GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu}
 
 2009-08-28 20:13:32,371 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value
 /nics/c/home/turuncu
 
 2009-08-28 20:13:32,371 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
 [Thread-7,resolveVariableInString:1392] Final string is
 /nics/c/home/turuncu
 
 2009-08-28 20:13:32,372 DEBUG
 ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264

Re: [gt-user] gt-user Digest, Vol 11, Issue 26

2009-08-29 Thread Martin Feller

: expected declaration specifiers or '...'
 before 'size_t'
 /usr/include/stdlib.h:762: error: expected declaration specifiers or '...'
 before 'size_t'
 /usr/include/stdlib.h:766: error: expected declaration specifiers or '...'
 before 'size_t'
 /usr/include/stdlib.h:770: error: expected declaration specifiers or '...'
 before 'size_t'
 /usr/include/stdlib.h:779: error: expected declaration specifiers or '...'
 before 'size_t'
 /usr/include/stdlib.h:782: error: expected ')' before '*' token
 /usr/include/stdlib.h:786: error: expected declaration specifiers or '...'
 before 'wchar_t'
 /usr/include/stdlib.h:790: error: expected '=', ',', ';', 'asm' or
 '__attribute__' before 'mbstowcs'
 /usr/include/stdlib.h:793: error: expected '=', ',', ';', 'asm' or
 '__attribute__' before 'wcstombs'
 ./common/gw_file_parser.c: In function 'gw_parse_line':
 ./common/gw_file_parser.c:32: warning: implicit declaration of function 
 'strdup'
 ./common/gw_file_parser.c:32: warning: incompatible implicit declaration of
 built-in function 'strdup'
 ./common/gw_file_parser.c:34: warning: implicit declaration of function
 'strtok_r'
 ./common/gw_file_parser.c:34: warning: assignment makes pointer from integer
 without a cast
 ./common/gw_file_parser.c:38: warning: implicit declaration of function
 'strcasecmp'
 ./common/gw_file_parser.c:40: warning: implicit declaration of function 
 'strchr'
 ./common/gw_file_parser.c:40: warning: incompatible implicit declaration of
 built-in function 'strchr'
 ./common/gw_file_parser.c: In function 'gw_parse_file':
 ./common/gw_file_parser.c:74: warning: incompatible implicit declaration of
 built-in function 'strchr'
 make[2]: ***
 [common/__srcdir__drmaa_libdrmaa___GLOBUS_FLAVOR_NAME__la-gw_file_parser.lo]
 Error 1
 make[2]: Leaving directory
 `/home/nimbus/work/nimbus/gt4.2.1-all-source-installer/source-trees/gridway/src'
 make[1]: *** [all-recursive] Error 1
 make[1]: Leaving directory
 `/home/nimbus/work/nimbus/gt4.2.1-all-source-installer/source-trees/gridway'
 
 ERROR: Build has failed
 make: *** [gridway] Error 2
 
 
 Regards
 Simar Virk


-- 
Martin Feller
The Globus Alliance
Computation Institute at University of Chicago
Mathematics  Computer Science Division at Argonne National Laboratory
Phone: 630 252-4826

Re: [gt-user] problem in the filestageOut

2009-08-28 Thread Martin Feller

Hi,

What version of the GT do you use?
If it's 4.0.7:
Get http://www.mcs.anl.gov/~feller/heller/globus_wsrf_rft.jar,
back up $GLOBUS_LOCATION/lib/globus_wsrf_rft.jar, drop the downloaded
file into $GLOBUS_LOCATION/lib, restart the GT server and retry your job.
Does that fix it?

Martin

globus world wrote:
 Hi
 
i am submitting a job to host fgwu.xtc.in .Problem in the
 fileStageOut. In my job description i am mentioning about my stdout and
 stderr files.But when fileStageOut step my stderr file is staging out but
 stdout file is not staging out . What may be the reason.My job description
 file as follows
 
 job
 executableMyexe/executable
 
 stdout${GLOBUS_USER_HOME}/stdout/stdout
 stderr${GLOBUS_USER_HOME}/stderr/stderr
 fileStageIn
 transfer
 sourceUrlgsiftp://mypc.org.in:2811/home/sagar/Myexe
 /sourceUrl
 
 destinationUrlfile:///${GLOBUS_USER_HOME}/Myexe/destinationUrl
 /transfer
 /fileStageIn
 
 fileStageOut
 transfer
 sourceUrlgsiftp://
 fgwu.xtc.in:2811/${GLOBUS_USER_HOME}/stdout/sourceUrl
 destinationUrlgsiftp://mypc.org.in:2811/home/sagar/stdout
 /destinationUrl
 /transfer
 transfer
  sourceUrlgsiftp://fgwu.xtc.in:2811/${GLOBUS_USER_HOME}/stderr
 /sourceUrl
 destinationUrlgsiftp://mypc.org.in:2811/home/sagar/stderr
 /destinationUrl
 /transfer
 /fileStageOut
 
 //Here in fileStageOut only stderr is staging out but stdout is not staging
 out
 
 fileCleanUp
 deletion
  filefile:///${GLOBUS_USER_HOME}/stdout/file
 /deletion
 deletion
  filefile:///${GLOBUS_USER_HOME}/stderr/file
 /deletion
 deletion
 filefile:///${GLOBUS_USER_HOME}/Myexe/file
 /deletion
 /fileCleanUp
 
 /job
 
 The error is
 
  Current job state: StageOut
 Current job state: Failed
 Destroying job...Done.
 Cleaning up any delegated credentials...Done.
 globusrun-ws: Job failed: Staging error for RSL element fileStageOut.
 Can't do MLST on non-existing file/dir /home/sagar/stdout on server
 mypc.org.in [Caused by: Server refused performing the request. Custom
 message: Server refused MLST command (error code 1) [Nested exception
 message:  Custom message: Unexpected reply: 500-Command failed : System
 error in stat: No such file or directory
 500-A system call failed: No such file or directory
 500 End.]]
 Can't do MLST on non-existing file/dir /home/sagar/stdout on server
 mypc.org.in [Caused by: Server refused performing the request. Custom
 message: Server refused MLST command (error code 1) [Nested exception
 message:  Custom message: Unexpected reply: 500-Command failed : System
 error in stat: No such file or directory
 500-A system call failed: No such file or directory
 500 End.]]
 
 
 Thanks and Regards
 sagar

Re: [gt-user] gt-user Digest, Vol 11, Issue 20

2009-08-26 Thread Martin Feller

Hi Simar,

simar gill wrote:
 Hi All
 I have ubuntu 32-bit installed in dual boot with window vista.
 I have installed all the prequisite softwares.
 Now want to set globus location environment but it does not work If I
 type in terminal the followinig:
 
 $ export GLOBUS_LOCATION=path/to/install
 $ cd $GLOBUS_LOCATION
 Error : directory does not exist.

In Vol 7, Issue 13 it says:
If the directory does not exist, create it.

 
 $ ./configure --prefix=/home/globus/globus-4.2.1.1
 bash: command not found
 If I run it as Globus user then it will give
 Error:permission denied

And further down the page:
Make sure the directory pointed to by the environment variable
$GLOBUS_LOCATION belongs to user globus and has the right
permissions, e.g. drwxr-xr-x.
Then try configure again as user globus.

Does it work better then? If not: More detailed descriptions
about directories and error messages will be helpful.

-Martin

Re: [gt-user] Globus 4.0.7 with Loadleveler Integration

2009-08-25 Thread Martin Feller

We don't officially support LL, but a group on TeraGrid
used LoadLeveler too. I assume it's a problem in
$GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml
Please send the content of that file.

-Martin

Asish M Madhu wrote:
 Dear All,
 
 I was trying to integrate GLOBUS 4.0.7 with Loadleveler (version:3) in AIX
 5.2 64 bit machine. I have installed GLOBUS 4.0.7 successfully using 64 bit
 flavour.
 
 For integrating with Loadleveler i used the llgrid.tar file from the already
 installed Loadleveler. Untared it and run the deploy.sh executable which
 will automatically integrate Globus with existing Loadleveler.
 
 The installation of the integration package didnt throw any error . But
 after this integration package is installed I cant start the Globus
 Container.
 
 
 I get the below error in the container.log file:
 
 
 
   vi $GLOBUS_LOCATION/var/container.log
 
 */usr/local/GARUDA/GLOBUS-4.0.7/var/container.log 2 lines, 264 characters
 Failed to start container: Failed to initialize 'ManagedJobFactoryService'
 service [Caused by: ; nested exception is:
 javax.naming.NamingException: Bean initialization failed [Root
 exception is java.lang.RuntimeException: java.lang.NumberFormatException:
 null]]*
 ~
 
 
 
 
 What could be the problem? The integration package creates the below file in
 Globus path.
 
 $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/loadleveler.pm
 $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml
 $GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml
 $GLOBUS_LOCATION/etc/grid-services/jobmanager-loadleveler
 $GLOBUS_LOCATION/etc/globus-loadleveler.conf
 $GLOBUS_LOCATION/libexec/globus-scheduler-provider-loadleveler
 
 But when i remove these files the container starts without any problem. How
 can I integrate Globus with Loadleveler ? Kindly help me.
 Thanks in advance
 
 Regards
 Asish M Madhu
 asis...@gmail.com

Re: [gt-user] Globus 4.0.7 with Loadleveler Integration

2009-08-25 Thread Martin Feller

Hi

Replace it by this, but substitute ${GLOBUS_LOCATION} with the
value of the environment variable ${GLOBUS_LOCATION}

?xml version=1.0 encoding=UTF-8?
jndiConfig xmlns=http://wsrf.globus.org/jndi/config;

!-- Configuration delta (addition) fpr a Local Resource Manager --

!-- Configuration for Managed Job *Factory* Service --

service name=ManagedJobFactoryService
!-- LRM configuration:  Fork --
resource
name=ForkResourceConfiguration
type=org.globus.exec.service.factory.FactoryResourceConfiguration
resourceParams
parameter
name
factory
/name
value
org.globus.wsrf.jndi.BeanFactory
/value
/parameter
parameter
name
localResourceManagerName
/name
value
Loadleveler
/value
/parameter
!-- Site-specific scratchDir
Default: ${GLOBUS_USER_HOME}/.globus/scratch
parameter
name
scratchDirectory
/name
value
${GLOBUS_USER_HOME}/.globus/scratch
   /value
/parameter
--
parameter
name
substitutionDefinitionsFile
/name
value

${GLOBUS_LOCATION}/etc/gram-service-Fork/substitution-definition.properties
/value
/parameter
parameter
name
substitutionDefinitionsRefreshPeriod
/name
value
!-- MINUTES --
480
/value
/parameter
parameter
name
enableDefaultSoftwareEnvironment
/name
value
false
/value
/parameter
/resourceParams
/resource

/service


/jndiConfig


-Martin

Asish M Madhu wrote:
 Hello Martin,
 
 Please find the content of jndi-config.xml
 ($GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml)
 
 
 
 -
 
 ?xml version=1.0 encoding=UTF-8?
 jndiConfig xmlns=http://wsrf.globus.org/jndi/config;
 
 !-- Configuration delta (addition) fpr a Local Resource Manager --
 
 !-- Configuration for Managed Job *Factory* Service --
 
 service name=ManagedJobFactoryService
 !-- LRM configuration:  Loadleveler --
 resource
 name=LoadlevelerResourceConfiguration
 
 type=org.globus.exec.service.factory.FactoryResourceConfiguration
 resourceParams
 parameter
 name
 factory
 /name
 value
 org.globus.wsrf.jndi.BeanFactory
 /value
 /parameter
 parameter
 name
 localResourceManagerName
 /name
 value
 Loadleveler
 /value
 /parameter
 parameter
 name
scratchDirectory
 /name
 value
/.globus/scratch
/value
 /parameter
 /resourceParams
 /resource
 
 /service
 
 
 /jndiConfig
 
 --
 
 Thanking you
 
 Regards
 Asish
 
 On Tue, Aug 25, 2009 at 5:03 PM, Martin Feller fel...@mcs.anl.gov wrote:
 
 We don't officially support LL, but a group on TeraGrid
 used LoadLeveler too. I assume it's a problem in
 $GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml
 Please send the content of that file.

 -Martin

 Asish M Madhu wrote:
 Dear All,

 I was trying to integrate GLOBUS 4.0.7 with Loadleveler (version:3) in
 AIX
 5.2 64 bit machine. I have installed GLOBUS 4.0.7 successfully using 64
 bit
 flavour.

 For integrating with Loadleveler i used the llgrid.tar file from the
 already
 installed Loadleveler. Untared it and run the deploy.sh executable which
 will automatically integrate Globus with existing Loadleveler.

 The installation of the integration package didnt throw any error . But
 after this integration package is installed I cant start the Globus
 Container.


 I get the below error

Re: [gt-user] Globus 4.0.7 with Loadleveler Integration

2009-08-25 Thread Martin Feller

 you
 
 Asish
 
 On Tue, Aug 25, 2009 at 5:18 PM, Martin Feller fel...@mcs.anl.gov wrote:
 
 Hi

 Replace it by this, but substitute ${GLOBUS_LOCATION} with the
 value of the environment variable ${GLOBUS_LOCATION}

 ?xml version=1.0 encoding=UTF-8?
 jndiConfig xmlns=http://wsrf.globus.org/jndi/config;

!-- Configuration delta (addition) fpr a Local Resource Manager --

!-- Configuration for Managed Job *Factory* Service --

service name=ManagedJobFactoryService
 !-- LRM configuration:  Fork --
resource
name=ForkResourceConfiguration

  type=org.globus.exec.service.factory.FactoryResourceConfiguration
resourceParams
parameter
name
factory
/name
value
org.globus.wsrf.jndi.BeanFactory
/value
/parameter
parameter
name
localResourceManagerName
/name
value
Loadleveler
/value
/parameter
 !-- Site-specific scratchDir
 Default: ${GLOBUS_USER_HOME}/.globus/scratch
 parameter
name
scratchDirectory
/name
value
 ${GLOBUS_USER_HOME}/.globus/scratch
   /value
/parameter
--
parameter
name
substitutionDefinitionsFile
/name
value

  ${GLOBUS_LOCATION}/etc/gram-service-Fork/substitution-definition.properties
/value
/parameter
parameter
name
substitutionDefinitionsRefreshPeriod
/name
value
!-- MINUTES --
480
/value
/parameter
parameter
name
enableDefaultSoftwareEnvironment
/name
value
false
 /value
/parameter
/resourceParams
/resource

/service


 /jndiConfig


 -Martin

 Asish M Madhu wrote:
 Hello Martin,

 Please find the content of jndi-config.xml
 ($GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml)




 -
 ?xml version=1.0 encoding=UTF-8?
 jndiConfig xmlns=http://wsrf.globus.org/jndi/config;

 !-- Configuration delta (addition) fpr a Local Resource Manager --

 !-- Configuration for Managed Job *Factory* Service --

 service name=ManagedJobFactoryService
 !-- LRM configuration:  Loadleveler --
 resource
 name=LoadlevelerResourceConfiguration

 type=org.globus.exec.service.factory.FactoryResourceConfiguration
 resourceParams
 parameter
 name
 factory
 /name
 value
 org.globus.wsrf.jndi.BeanFactory
 /value
 /parameter
 parameter
 name
 localResourceManagerName
 /name
 value
 Loadleveler
 /value
 /parameter
 parameter
 name
scratchDirectory
 /name
 value
/.globus/scratch
/value
 /parameter
 /resourceParams
 /resource

 /service


 /jndiConfig


 --
 Thanking you

 Regards
 Asish

 On Tue, Aug 25, 2009 at 5:03 PM, Martin Feller fel...@mcs.anl.gov
 wrote:
 We don't officially support LL, but a group on TeraGrid
 used LoadLeveler too. I assume it's a problem in
 $GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml
 Please send the content of that file.

 -Martin

 Asish M Madhu wrote:
 Dear All,

 I was trying to integrate GLOBUS 4.0.7 with Loadleveler (version:3) in
 AIX
 5.2 64 bit machine. I have installed GLOBUS 4.0.7 successfully using 64
 bit
 flavour.

 For integrating with Loadleveler i used the llgrid.tar file from the
 already
 installed Loadleveler. Untared it and run the deploy.sh executable
 which
 will automatically integrate Globus with existing Loadleveler.

 The installation of the integration package didnt throw any error . But
 after this integration package is installed I cant start

Re: [gt-user] gridftp issues (connection refused on control channel)

2009-08-25 Thread Martin Feller

Hi,

RFT has a retry mechanism for failing transfers. If you didn't specify
a maxAttempts elements in the staging elements of your job description,
you can try to add it and see if it helps.
maxAttempts specifies how often RFT will try a transfer in case of
(transient) transfer errors. It defaults to no retries.
You can add this element to fileStageIn, fileStageOut and fileCleanUp:

...
fileStageIn
maxAttempts10/maxAttempts
transfer
  sourceUrlgsiftp://.../sourceUrl
  destinationUrlgsiftp://.../destinationUrl
/transfer
/fileStageIn
...

-Martin

Andre Charbonneau wrote:
 Hello,
 Lately I've been running some benchmarks against a globus resource (gt
 4.0.8) here and we are noticing some rft issues when multiple jobs are
 submitted concurrently.
 
 The jobs are simple /bin/hostname jobs, with a small stagein and
 stageout file in order to involve rft.  The jobs are submitted
 concurrently (to the Fork factory) by a small python script, that forks
 a thread per globusrun-ws command, and then waits for all the threads to
 return.
 Everything looks ok when I submit the jobs one after the other, but when
 I submit a number of jobs concurrently (10), then I start seing some of
 the globusrun-ws commands return with an exit code of 255 and the
 following error message at the client side:
 
 globusrun-ws: Job failed: Staging error for RSL element fileStageOut.
 Connection creation error [Caused by: java.io.EOFException]
 Connection creation error [Caused by: java.io.EOFException]
 
 I could not find anything in the server side container.log.
 
 So I enabled debugging at the gridftp level on the server side and I
 found the following:
 
 2009-08-06 15:08:01,118 DEBUG vanilla.FTPControlChannel
 [Thread-47,createSocketDNSRR:153] opening control channel to
 /xxx : 2811
 
 (...)
 
 2009-08-06 15:08:01,180 DEBUG vanilla.Reply [Thread-47,init:65] read
 1st line
 2009-08-06 15:08:01,807 DEBUG vanilla.Reply [Thread-47,init:68] 1st
 line: null
 2009-08-06 15:08:01,809 DEBUG vanilla.FTPControlChannel
 [Thread-47,write:363] Control channel sending: QUIT
 
 2009-08-06 15:08:01,810 DEBUG vanilla.FTPControlChannel
 [Thread-47,close:260] ftp socket closed
 2009-08-06 15:08:01,812 DEBUG vanilla.FTPServerFacade
 [Thread-47,close:340] close data channels
 2009-08-06 15:08:01,813 DEBUG vanilla.FTPServerFacade
 [Thread-47,close:343] close server socket
 2009-08-06 15:08:01,813 DEBUG vanilla.FTPServerFacade
 [Thread-47,stopTaskThread:369] stop master thread
 2009-08-06 15:08:01,814 ERROR cache.ConnectionManager
 [Thread-47,createNewConnection:345] Can't create connection:
 java.io.EOFException
 2009-08-06 15:08:01,820 ERROR service.TransferWork [Thread-47,run:408]
 Transient transfer error
 Connection creation error [Caused by: java.io.EOFException]
 Connection creation error. Caused by java.io.EOFException
 
 
 I not 100% sure that these errors are related, but the Connection
 creation error. Caused by java.io.EOFException error string makes me
 think they are.  From the gridftp log above, it looks like the control
 channel connection (port 2811) back to the submit machine (probably for
 stageout step) fails.
 
 
 
 In order to debug this, we have tried making the gridftp connection
 limit much higher in the /etc/inetd.d/gridftp script but that didn't
 seem to help.  We have a port range of 200, which I think should be
 enough to handle 10 or so concurrent job with one stagein and 2 stageout
 elements per job.  We also experimented with that port range, but with
 no success.
 
 Is this something that anyone experienced before?
 
 Maybe there some other configuration that I can change that might fix
 this issue?
 
 Any help or feedback about this is much appreciated.
 
 Best regards,
Andre

Re: [gt-user] GRAM jobs dying after 24 hours

2009-08-12 Thread Martin Feller

Yuriy wrote:
 Cannot reproduce it anymore... I submitted jobs with/without
 delegation, with/without streaming, with globus-delegate for
 credential and without, and none of them were killed... In fact I
 cannot see any user jobs dying for about a week now. Maybe it is
 related to the state of the container? 
 
 Is there anything in the logs that could indicate the moment that
 some credential was removed and the reason for it? 

By default no. You can set the log level for the delegation service
to debug (log4j.category.org.globus.delegation.service=DEBUG in
$GLOBUS_LOCATION/container-log4j.properties), and the log tells you
then that a delegation resource is being destroyed, but unfortunately
it does not tell you the id/name of the resource.

As far as I know the reason for removal can be:
- explicit call to destroy by a client
- a client/service tries to access the credential and it is expired.

I think there's no general periodical sweep and destroy if expired
for persisted delegation resources.

 
 The persisted/../DelegationResource/ folder (this is where credentials
 are stored, right?) 

right.

 contains 1200 files, most of the related jobs are
 probably dead. Is there any way to decider those files and see what is
 inside? 
 

Delegated credentials are serialized Java objects (DelegationResource objects).

I attached a small program that reads all serialized delegated credentials
from the persistence directory and prints information about it.
Point the variable persistenceDirName to the persistence directory of the
delegated credentials before you compile it.

Compile it:
- source ${GLOBUS_LOCATION}/etc/globus-devel-env.sh (assuming bash/bourne shell)
- javac CheckDelegationResources.java (assuming java 1.4+)

Run it:
- java CheckDelegationResource

This program won't win a beauty contest, extend it as you need it.

Hope this helps.

-Martin


 Cheers,
 Yuriy
 
 
 On Mon, Aug 10, 2009 at 08:24:35AM -0500, Martin Feller wrote:
 What very probably happens is that a credential being delegated to the
 server expired. It's being removed on the server-side in that case
 and jobs that still refer to such a (no longer existing) credential
 fail with the error message you pasted.

 How do you delegate the credential that is being used by jobs:
 * Do you let globusrun-ws delegate for you?
 * Do you delegate a credential, e.g. using globus-credential-delegate
   and refer to the credential in your job description or let globusrun-ws
   pick up the epr of the manually delegated credential?

 You can debug this e.g. like this:
 * Submit jobs that do not require a delegated credential and see if the
   same problem still occurs. From your description I'd say that those jobs
   will not fail.
 * Delegate a credential that is valid for, say, 60h, using
   globus-credential-delegate and refer to that credential in your jobs.
   (globusrun-ws options: -Jf, -Sf) and check if the jobs still fail after
   24h.

 Maybe worth noting: sometimes people delegate although they don't really
 need to delegate, i.e. the job does not need a job credential and no
 staging is performed.

 -Martin

 Yuriy wrote:
  Hi,
  
  Some of the jobs submitted to torque via GRAM are killed after about
  24 hours in the queue, all with the similar message in globus logs: 
  
  2009-07-10 11:32:16,052 INFO  exec.StateMachine 
 [RunQueueThread_5,logJobFailed:3250] Job 
 74bd3c60-6c17-11de-9a06-9ba1d1ebd14a failed. Description: Couldn't obtain a 
 delegated credential. Cause: org.globus.exec.generated.FaultType: Couldn't 
 obtain a delegated credential. caused by [0: 
 org.oasis.wsrf.faults.BaseFaultType: Error getting delegation resource 
 [Caused by: org.globus.wsrf.NoSuchResourceException]]
  
  torque reports exit status = 271 (exceeds resource limit or killed by
  user), none of the problematic jobs seem to exceed any
  limits. Moreover we had a lot of jobs that run for longer then 24 hours
  and completed successfully (sometimes users just re-submitted jobs
  with the same description and using exactly the same tools and it
  completed without any problems). 
  
  All problematic jobs were submitted with globusrun-ws tool 
  
  Could anyone explain what is going on here? 
  
  
  Currently we use globus version from VDT 1.10, started with VDT 1.6 
  From looking in logs, we  had the same problem for over a year, but not
  many people are affected and most just re-submit without
  reporting. 
  
  Cheers,
  Yuriy
  



import java.io.File;
import java.io.FileInputStream;
import java.io.ObjectInputStream;
import java.util.Calendar;
import java.util.Date;

public class CheckDelegationResources {

public static void main(String[] args)
throws Exception {

// Fill in path to persistence directory of delegated credentials
String persistenceDirName = ;

File persistenceDir = new File(persistenceDirName);
if (persistenceDir.exists()) {
String[] resources

Re: [gt-user] Installation problem of GT 4.2.1

2009-08-11 Thread Martin Feller

 or '...'
 before 'size_t'
 /usr/include/stdlib.h:766: error: expected declaration specifiers or '...'
 before 'size_t'
 /usr/include/stdlib.h:770: error: expected declaration specifiers or '...'
 before 'size_t'
 /usr/include/stdlib.h:779: error: expected declaration specifiers or '...'
 before 'size_t'
 /usr/include/stdlib.h:782: error: expected ')' before '*' token
 /usr/include/stdlib.h:786: error: expected declaration specifiers or '...'
 before 'wchar_t'
 /usr/include/stdlib.h:790: error: expected '=', ',', ';', 'asm' or
 '__attribute__' before 'mbstowcs'
 /usr/include/stdlib.h:793: error: expected '=', ',', ';', 'asm' or
 '__attribute__' before 'wcstombs'
 ./common/gw_file_parser.c: In function 'gw_parse_line':
 ./common/gw_file_parser.c:32: warning: implicit declaration of function
 'strdup'
 ./common/gw_file_parser.c:32: warning: incompatible implicit declaration of
 built-in function 'strdup'
 ./common/gw_file_parser.c:34: warning: implicit declaration of function
 'strtok_r'
 ./common/gw_file_parser.c:34: warning: assignment makes pointer from integer
 without a cast
 ./common/gw_file_parser.c:38: warning: implicit declaration of function
 'strcasecmp'
 ./common/gw_file_parser.c:40: warning: implicit declaration of function
 'strchr'
 ./common/gw_file_parser.c:40: warning: incompatible implicit declaration of
 built-in function 'strchr'
 ./common/gw_file_parser.c: In function 'gw_parse_file':
 ./common/gw_file_parser.c:74: warning: incompatible implicit declaration of
 built-in function 'strchr'
 make[2]: ***
 [common/__srcdir__drmaa_libdrmaa___GLOBUS_FLAVOR_NAME__la-gw_file_parser.lo]
 Error 1
 make[2]: Leaving directory
 `/home/nimbus/work/nimbus/gt4.2.1-all-source-installer/source-trees/gridway/
 src'
 make[1]: *** [all-recursive] Error 1
 make[1]: Leaving directory
 `/home/nimbus/work/nimbus/gt4.2.1-all-source-installer/source-trees/gridway'
 
 ERROR: Build has failed
 make: *** [gridway] Error 2
 nim...@ubuntu:~/work/nimbus/gt4.2.1-all-source-installer$ 
 
 
 kindest regards
 Wei
 
 


-- 
Martin Feller
The Globus Alliance
Computation Institute at University of Chicago
Mathematics  Computer Science Division at Argonne National Laboratory
Phone: 630 252-4826

Re: [gt-user] GRAM jobs dying after 24 hours

2009-08-10 Thread Martin Feller

What very probably happens is that a credential being delegated to the
server expired. It's being removed on the server-side in that case
and jobs that still refer to such a (no longer existing) credential
fail with the error message you pasted.

How do you delegate the credential that is being used by jobs:
* Do you let globusrun-ws delegate for you?
* Do you delegate a credential, e.g. using globus-credential-delegate
  and refer to the credential in your job description or let globusrun-ws
  pick up the epr of the manually delegated credential?

You can debug this e.g. like this:
* Submit jobs that do not require a delegated credential and see if the
  same problem still occurs. From your description I'd say that those jobs
  will not fail.
* Delegate a credential that is valid for, say, 60h, using
  globus-credential-delegate and refer to that credential in your jobs.
  (globusrun-ws options: -Jf, -Sf) and check if the jobs still fail after
  24h.

Maybe worth noting: sometimes people delegate although they don't really
need to delegate, i.e. the job does not need a job credential and no
staging is performed.

-Martin

Yuriy wrote:
  Hi,
  
  Some of the jobs submitted to torque via GRAM are killed after about
  24 hours in the queue, all with the similar message in globus logs: 
  
  2009-07-10 11:32:16,052 INFO  exec.StateMachine 
 [RunQueueThread_5,logJobFailed:3250] Job 74bd3c60-6c17-11de-9a06-9ba1d1ebd14a 
 failed. Description: Couldn't obtain a delegated credential. Cause: 
 org.globus.exec.generated.FaultType: Couldn't obtain a delegated credential. 
 caused by [0: org.oasis.wsrf.faults.BaseFaultType: Error getting delegation 
 resource [Caused by: org.globus.wsrf.NoSuchResourceException]]
  
  torque reports exit status = 271 (exceeds resource limit or killed by
  user), none of the problematic jobs seem to exceed any
  limits. Moreover we had a lot of jobs that run for longer then 24 hours
  and completed successfully (sometimes users just re-submitted jobs
  with the same description and using exactly the same tools and it
  completed without any problems). 
  
  All problematic jobs were submitted with globusrun-ws tool 
  
  Could anyone explain what is going on here? 
  
  
  Currently we use globus version from VDT 1.10, started with VDT 1.6 
  From looking in logs, we  had the same problem for over a year, but not
  many people are affected and most just re-submit without
  reporting. 
  
  Cheers,
  Yuriy

Re: [gt-user] WS_GRAM Stage-out problem

2009-08-06 Thread Martin Feller

Hi Helmut,

Uh, it's a while ago, but i think i remember this issue.
I *thought* it was fixed in 4.0.8, but I created a jar from
globus_4_0_branch. It's built using Java 1.4 and you can get
it from here:
http://www.mcs.anl.gov/~feller/heller/globus_wsrf_rft.jar

Can you give it a try by dropping it into ${GLOBUS_LOCATION}/lib,
and tell us if it works for you with that jar?

-Martin

Helmut Heller wrote:
 Hello Martin,
 
 We run GT4.0.8 but we are encountering the same error that Sergey
 describes. Unfortunately, the link you give below no longer works. Can
 you please point me to a globus_wsrf_rft.jar for GT4.0.8?
 
 Thanks a lot in advance,
 
 Helmut
 
 
 On 14.05.2008, at 15:12, Martin Feller wrote:
 
 Sergey,

 You are probably using GT 4.0.6 or GT 4.0.7, right?
 Unfortunately we don't have an update package for that problem yet,
 but you
 can download an updated RFT jar from here:
 http://www-unix.mcs.anl.gov/~feller/calebe/globus_wsrf_rft.jar
 To install it copy it to $GLOBUS_LOCATION/lib.
 After a GT server restart the problem should go away.

 Martin

 - Original Message -
 From: S.Kulanov s.kula...@mail.ru
 To: Globus gt-u...@globus.org
 Sent: Wednesday, May 14, 2008 1:44:43 AM GMT -06:00 US/Canada Central
 Subject: [gt-user] WS_GRAM Stage-out problem

 Good day,
 I have some problems with StageOut while using example from
 http://www.globus.org/toolkit/docs/4.0/execution/wsgram/user-index.html#s-wsgram-user-usagescenarios

 I have hosts: CA and hosta.

 Checking GridFTP copy CAhosta:

 [kula...@ca ~]$ globus-url-copy -dbg
 gsiftp://ca.kulanov.org.ua:2811/bin/echo
 gsiftp://hosta.kulanov.org.ua:2811/tmp/my_echo
 .
 debug: response from gsiftp://ca.kulanov.org.ua:2811/bin/echo:
 150 Begining transfer.
 debug: response from gsiftp://hosta.kulanov.org.ua:2811/tmp/my_echo:
 150 Begining transfer.
 debug: response from gsiftp://ca.kulanov.org.ua:2811/bin/echo:
 226 Transfer Complete.
 debug: response from gsiftp://hosta.kulanov.org.ua:2811/tmp/my_echo:
 226 Transfer Complete.
 debug: operation complete
 [kula...@ca ~]$
 

 everything works fine
 Now I' like to test WSGRAM:
 Here is the job description file:
 ==BEGIN===
 job
executablemy_echo/executable
directory${GLOBUS_USER_HOME}/directory
argumentHello/argument
argumentWorld!/argument
stdout${GLOBUS_USER_HOME}/stdout/stdout
stderr${GLOBUS_USER_HOME}/stderr/stderr
 fileStageIn
transfer
sourceUrlgsiftp://hosta.kulanov.org.ua:2811/bin/echo/sourceUrl

 destinationUrlgsiftp://ca.kulanov.org.ua:2811/${GLOBUS_USER_HOME}/my_echo/destinationUrl

/transfer
 /fileStageIn

 fileStageOut
transfer

 sourceUrlgsiftp://ca.kulanov.org.ua:2811/${GLOBUS_USER_HOME}/stdout/sourceUrl


 destinationUrlgsiftp://ca.kulanov.org.ua:2811/tmp/stdout/destinationUrl

/transfer
 /fileStageOut

 fileCleanUp
deletion
filefile:///${GLOBUS_USER_HOME}/my_echo/file
/deletion
 /fileCleanUp
 /job
 =END =
 As you can see I just StageOUT on the same host - CA

 [kula...@ca ~]$ globusrun-ws -submit -S -f test.xml
 Delegating user credentials...Done.
 Submitting job...Done.
 Job ID: uuid:ca1f3cc2-2199-11dd-b418-000c29da9a67
 Termination time: 05/15/2008 09:40 GMT
 Current job state: StageIn
 Current job state: Active
 Current job state: StageOut
 Current job state: CleanUp
 Current job state: Done
 Destroying job...Done.
 Cleaning up any delegated credentials...Done.
 [kula...@ca ~]$
 

 everything works fine
 Now we just change the  StageOUT section to, so StageOUT will point to
 hosta:

 fileStageOut
transfer

 sourceUrlgsiftp://ca.kulanov.org.ua:2811/${GLOBUS_USER_HOME}/stdout/sourceUrl


 destinationUrlgsiftp://hosta.kulanov.org.ua:2811/tmp/stdout/destinationUrl

/transfer
 /fileStageOut

 [kula...@ca ~]$ globusrun-ws -submit -S -f test.xml
 Delegating user credentials...Done.
 Submitting job...Done.
 Job ID: uuid:94307f62-219a-11dd-aeb6-000c29da9a67
 Termination time: 05/15/2008 09:46 GMT
 Current job state: StageIn
 Current job state: Active
 Current job state: StageOut
 Current job state: Failed
 Destroying job...Done.
 Cleaning up any delegated credentials...Done.
 globusrun-ws: Job failed: Staging error for RSL element fileStageOut.
 Can't do MLST on non-existing file/dir /home/kulanov/stdout on server
 hosta.kulanov.org.ua [Caused by: Server refused performing the request.
 Custom message: Server refused MLST command (error code 1) [Nested
 exception message:  Custom message: Unexpected reply: 500-Command failed
 : System error in stat: No such file or directory
 500-A system call failed: No such file or directory
 500 End.]]
 Can't do MLST on non-existing file/dir /home/kulanov/stdout on server
 hosta.kulanov.org.ua [Caused by: Server refused performing the request.
 Custom message: Server refused MLST

Re: [gt-user] Error while starting the container

2009-05-05 Thread Martin Feller

By default GT starts to listen on port 8443. It seems that this port
is already taken by another application (maybe Tomcat, or another
instance of GT already running?)
You can specify another port using the -p, like globus-start-container -p 8445

-Martin

Manisha Lakra wrote:
 Hello,
 
 For installing WS-GRAM, I am starting the globus container as given
 in the System Administrator's Guide for WS-GRAM. For this, I used the
 following command:
 
 $GLOBUS_LOCATION/bin/globus-start-container
 
 I get the following error, for the above command:
 
 [JWSCORE-114] Failed to start container: [JWSCORE-200] Container failed to
 initialize [Caused by: Address already in use]
 
 I tried to change the IP address for my system, but still the same error
 persist. Can anyone tell me what is the problem?
 
 
 Thanks  Regards,
 Manisha Lakra

Re: [gt-user] How to set default local resource manager in gt4.0?

2009-04-28 Thread Martin Feller

Hi,

This feature is not supported in the 4.0 series. It's not something
that can be configured, but it required code changes.

-Martin

Prashanth Chengi wrote:
 Dear all,
 
 On our site, we are running gt4.0.8.  We want to disable fork and set
 PBS as the default local resource manager. We were able to find
 documentation to do so in gt4.2 but not gt4.0.  Any suggestions on how
 we can implement it on gt4.0?
 
 Thanks and Regards,
 Prashanth Chengi
 National PARAM SuperComputing Facility,
 System Administration and Networking Group,
 C-DAC Pune.
 Ext-183
 Mob: 09766044870
 
 Courage is the resistance to fear, mastery of fear,
 Not the absence of fear -Mark Twain

Re: [gt-user] How to set default local resource manager in gt4.0?

2009-04-28 Thread Martin Feller

Hi,

If it is only about disabling Fork:
http://tinyurl.com/czo6kb

All jobs that go to Fork will then result in an error:

   'The Managed Job Factory Service at 
https://dadada:8443/wsrf/services/ManagedJobFactoryService
does not have a resource with key Fork.'

This does not yet set PBS as default local resource manager though.
Is this enough?

Otherwise we'd need to hack.

-Martin

Prashanth Chengi wrote:
 Thanks for the info! I was hunting high and low for documentation for
 that! Any crude hacks that could us help achieve that? We want PBS and
 not fork for two reasons:
 
 1) We don't want jobs running on the headnode.
 2) Our accounting mechanism is integrated with PBS.
 
 Migrating to 4.2 is not an option at the moment as it's not
 backward-compatible with 4.0.x which our sister sites are using currently.
 
 Thanks and Regards,
 Prashanth Chengi
 National PARAM SuperComputing Facility
 System Administration and Networking Group
 C-DAC Pune.
 
 Courage is the resistance to fear, mastery of fear,
 Not the absence of fear -Mark Twain
 
 
 On Tue, 28 Apr 2009, Martin Feller wrote:
 
 Hi,

 This feature is not supported in the 4.0 series. It's not something
 that can be configured, but it required code changes.

 -Martin

 Prashanth Chengi wrote:
 Dear all,

 On our site, we are running gt4.0.8.  We want to disable fork and set
 PBS as the default local resource manager. We were able to find
 documentation to do so in gt4.2 but not gt4.0.  Any suggestions on how
 we can implement it on gt4.0?

 Thanks and Regards,
 Prashanth Chengi
 National PARAM SuperComputing Facility,
 System Administration and Networking Group,
 C-DAC Pune.
 Ext-183
 Mob: 09766044870

 Courage is the resistance to fear, mastery of fear,
 Not the absence of fear -Mark Twain




 -- 
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.



-- 
Martin Feller
The Globus Alliance
Computation Institute at University of Chicago
Mathematics  Computer Science Division at Argonne National Laboratory
Phone: 630 252-4826

Re: [gt-user] choice of db

2009-04-17 Thread Martin Feller

The GT as a whole does not work with a common DB system, it's the
individual services (like RFT, GRAM) that might make use of DB system,
and it can differ between the services.
As far as I know in the 4.0 series MySQL and PostgreSQL are supported
by those services using a DB system, in 4.2.x maybe additionally Derby.
The online documentation of the individual services should tell you more
precisely what is supported.

-Martin

jebin wrote:
 I just wanted to know if the globus toolkit works with mysql or do I
 have to use postgresql
 
 rgds
 jebin cherian

Re: [gt-user] Problem with RFT configuration: No suitable driver found error

2009-04-10 Thread Martin Feller

The connectionString in the DB section is wrong in your jndi-config.xml
Must not be $GLOBUS_LOCATION/var/rftDatabase, but should be
jdbc:postgresql://host[:port]/rftDatabase
Also check 
http://www.globus.org/toolkit/docs/latest-stable/data/rft/admin/#rft-postgresql

-Martin

Sergei Smolov wrote:
 Hello, List!
 I've installed Globus Toolkit 4.2.1 and PostgreSQL 7.3.2 for RFT testing.
 Then I execute the following commands:
 
 ./postmaster -D data directory address -o -i
 
 $GLOBUS_LOCATION/sbin/globus-gridftp-server -p 2811
 
 $GLOBUS_LOCATION/bin/globus-start-container
 
 When I try to start container, I get the following error:
 
 Unable to connect to database.No suitable driver found for
 /home/ssedai/GlobusToolkit/var/rftDatabase. Caused by java.sql.SQLException:
 No suitable driver found for /home/ssedai/GlobusToolkit/var/rftDatabase
 at java.sql.DriverManager.getConnection(DriverManager.java:602)
 at java.sql.DriverManager.getConnection(DriverManager.java:185)
 at
 org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:48)
 at
 org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:290)
 at
 org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:771)
 at org.apache.commons.dbcp.PoolingDriver.connect(PoolingDriver.java:175)
 at java.sql.DriverManager.getConnection(DriverManager.java:582)
 at java.sql.DriverManager.getConnection(DriverManager.java:207)
 at
 org.globus.transfer.reliable.service.database.RFTDatabaseSetup.getDBConnection(RFTDatabaseSetup.java:261)
 at
 org.globus.transfer.reliable.service.database.ReliableFileTransferDbAdapter.setSchemaVersion(ReliableFileTransferDbAdapter.java:441)
 at
 org.globus.transfer.reliable.service.database.ReliableFileTransferDbAdapter.setup(ReliableFileTransferDbAdapter.java:155)
 at
 org.globus.transfer.reliable.service.ReliableFileTransferImpl.init(ReliableFileTransferImpl.java:78)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at java.lang.Class.newInstance0(Class.java:355)
 at java.lang.Class.newInstance(Class.java:308)
 at
 org.globus.axis.providers.RPCProvider.getNewServiceInstance(RPCProvider.java:120)
 at
 org.globus.axis.description.ServiceDescUtil.initializeProviders(ServiceDescUtil.java:214)
 at
 org.globus.axis.description.ServiceDescUtil.initializeService(ServiceDescUtil.java:163)
 at
 org.globus.wsrf.container.ServiceManager$InitPrivilegedAction.initialize(ServiceManager.java:384)
 at
 org.globus.wsrf.container.ServiceManager$InitPrivilegedAction.run(ServiceManager.java:396)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:60)
 at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:100)
 at
 org.globus.wsrf.container.ServiceManager.initializeService(ServiceManager.java:271)
 at org.globus.wsrf.container.ServiceManager.start(ServiceManager.java:177)
 at
 org.globus.wsrf.container.ServiceDispatcher.startServices(ServiceDispatcher.java:799)
 at
 org.globus.wsrf.container.ServiceDispatcher.init(ServiceDispatcher.java:435)
 at
 org.globus.wsrf.container.ServiceContainer.start(ServiceContainer.java:252)
 at
 org.globus.wsrf.container.ServiceContainer.init(ServiceContainer.java:212)
 at
 org.globus.wsrf.container.GSIServiceContainer.init(GSIServiceContainer.java:42)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at
 org.globus.wsrf.container.ServiceContainer.createContainer(ServiceContainer.java:168)
 at
 org.globus.wsrf.container.ServiceContainer.startSecurityContainer(ServiceContainer.java:606)
 at
 org.globus.wsrf.container.ServiceContainer.main(ServiceContainer.java:539)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.globus.bootstrap.BootstrapBase.launch(BootstrapBase.java:114)
 at org.globus.bootstrap.ContainerBootstrap.main(ContainerBootstrap.java:40)
 2009-04-09T16:01:14.200+04:00 ERROR service.ReliableFileTransferImpl
 [main,oldLog:179] Unable to setup database driver with pooling.Unable to
 connect to database.No suitable driver found for

Re: [gt-user] Output file problem

2009-04-10 Thread Martin Feller

Ritesh,

In Gram4 you can use the fileStageOut element in the job description to get 
files.
Check http://tinyurl.com/c8mpj3

Note: For that to work a GridFTP server must be running on the Gram4 machine
(or on some other machine that has access to the data written by your job),
and on your client machine (or where you want the data to be transferred to).

If you want to get files to your client-machine and don't want to run a GridFTP
server on your client, you have to get the data manually after the job, e.g. 
using
a gridftp client like globus-url-copy. For this to work you still have to have a
GridFTP server running on the Gram4 machine, (or on some other machine that has
access to the data written by your job).

But you can also fetch it by some other transfer mechanism (e.g. scp, ftp, 
floppy disk)

-Martin

Ritesh Badwaik wrote:
 Hi,
 If job submitted to globus produces some output file then how to
 retrieve that  output  file from  GLOBUS_USER_HOME directory where job
 is executed.
 
 Thanks and regards
 Ritesh

Re: [gt-user] problem with globus+condor-g

2009-03-30 Thread Martin Feller

Hm,

I never saw this. The problem seems to be this:

3/30 20:00:55 [18840] GAHP[18841] (stderr) -  faultCode: 
{http://schemas.xmlsoap.org/soap/envelope/}Server.userException
3/30 20:00:55 [18840] GAHP[18841] (stderr) -  faultSubcode:
3/30 20:00:55 [18840] GAHP[18841] (stderr) -  faultString: 
org.globus.common.ChainedIOException: Authentication failed [Caused by: 
Miscellaneous failure.
[Caused by: Bad certificate (java.security.SignatureException: 
SHA-1/RSA/PKCS#1: Not initialized)]]
3/30 20:00:55 [18840] GAHP[18841] (stderr) -  faultActor:
3/30 20:00:55 [18840] GAHP[18841] (stderr) -  faultNode:
3/30 20:00:55 [18840] GAHP[18841] (stderr) -  faultDetail:
3/30 20:00:55 [18840] GAHP[18841] (stderr) -   
{http://xml.apache.org/axis/}stackTrace:Authentication failed. Caused by 
Miscellaneous failure. Caused by
COM.claymoresystems.ptls.SSLThrewAlertException: Bad certificate 
(java.security.SignatureException: SHA-1/RSA/PKCS#1: Not initialized)

Did you actually create a user proxy certificate before the condor submission?
Does a job submission using globusrun-ws work from this client to the same 
server?

-Martin


induru hemanth wrote:
 Martin Sir,
 
  Thanks for your response
 
  I am attaching the following files  GridmanagerLog.globus ,
 conatiner.log, containeLog
 
  Thanking You
 
 
 
 
 Hemanth
 
 BITMESRA

Re: [gt-user] problem with globus+condor-g

2009-03-28 Thread Martin Feller

You should get more detailed GridManager logging on the client-side by setting
the parameter GRIDMANAGER_DEBUG = D_FULLDEBUG in your Condor configuration.

Please do this and send the Gridmanager log again, and also send the server-side
GT4 container logfile for more information.

-Martin

induru hemanth wrote:
 Hi,
 
  I am using globus-4.2.1  condor-7.2.0
 
 I have a problem while submmitting jobs from condor-G to   globus
 
 [glo...@g1 ~]$ vi xyy_cond
 
 Executable =/home/globus/xyy.sh
   universe = grid
   grid_resource = gt4
 https://g1:8443/wsrf/services/ManagedJobFactoryService Condor
   output  = xyy.out
   error   = xyy.error
   Log = xyy.log
 Queue
 
 [glo...@g1 ~]$ condor_submit xyy_cond
 Submitting job(s).
 Logging submit event(s).
 1 job(s) submitted to cluster 105.
 
 [glo...@g1 ~]$ vi xyy.log
 000 (105.000.000) 03/28 10:01:23 Job submitted from host: 
 172.16.40.200:51114
 ...
 012 (105.000.000) 03/28 10:01:40 Job was held.
 Failed to create proxy delegation
 Code 0 Subcode 0
 
 [glo...@g1 ~]$ cd /home/condor/log
 
 [glo...@g1 log]$ vi GridmanagerLog.globus
 
 
 3/28 10:01:23 **
 3/28 10:01:23 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
 3/28 10:01:23 ** /home/condor/condor-7.2.0/sbin/condor_gridmanager
 3/28 10:01:23 ** SubsystemInfo: name=GRIDMANAGER type=DAEMON(10) 
 class=DAEMON(1)
 3/28 10:01:23 ** Configuration: subsystem:GRIDMANAGER local:NONE 
 class:DAEMON
 3/28 10:01:23 ** $CondorVersion: 7.2.0 Dec 19 2008 BuildID: 121001 $
 3/28 10:01:23 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
 3/28 10:01:23 ** PID = 17288
 3/28 10:01:23 ** Log last touched 3/28 09:18:31
 3/28 10:01:23 **
 3/28 10:01:23 Using config source: /home/condor/condor-7.2.0/etc/condor_config
 3/28 10:01:23 Using local config sources:
 3/28 10:01:23/home/condor/condor_config.local
 3/28 10:01:23 DaemonCore: Command Socket at 172.16.40.200:35368
 3/28 10:01:26 [17288] JEF: ConfigureGahp()
 3/28 10:01:26 [17288] Found job 105.0 --- inserting
 3/28 10:01:26 [17288] gahp server not up yet, delaying ping
 3/28 10:01:26 [17288] gahp server not up yet, delaying checkDelegation
 3/28 10:01:28 [17288] (105.0) doEvaluateState called: gmState GM_INIT,
 globusState
 3/28 10:01:28 [17288] GAHP server pid = 17292
 3/28 10:01:38 [17288] (105.0) doEvaluateState called: gmState
 GM_UNSUBMITTED, globusState
 3/28 10:01:40 [17288] resource
 https://g1:8443/wsrf/services/ManagedJobFactoryService is now up
 3/28 10:01:40 [17288] (105.0) doEvaluateState called: gmState
 GM_DELEGATE_PROXY, globusState
 3/28 10:01:40 [17288]
 delegate_credentials(https://g1:8443/wsrf/services/DelegationFactoryService)
 failed!
 3/28 10:01:40 [17288] (105.0) doEvaluateState called: gmState
 GM_DELEGATE_PROXY, globusState
 3/28 10:01:43 [17288] No jobs left, shutting down
 3/28 10:01:43 [17288] Got SIGTERM. Performing graceful shutdown.
 3/28 10:01:43 [17288]  condor_gridmanager (condor_GRIDMANAGER) pid
 17288 EXITING WITH STATUS 0
 
 ___
 
 CAN ANY ONE HELP ME
 
Thanking You
 
 
  Hemanth,
 
  BIT Mesra.

Re: [gt-user] Problem with Ganglia IP

2009-03-28 Thread Martin Feller

I can't tell you what the issue is at the moment.
Why do you prefer this version?
It's very old, and had problems that have been solved in newer versions.

-Martin

cmasmas cmasmas wrote:
 I would prefer to use this version. Any temporary solution to the problem?
 
 2009/3/27 Martin Feller fel...@mcs.anl.gov
 
 I'd highly recommend to pick a later version of the GT,
 if that's doable for you.
 4.2.1 for the 4.2 series, or 4.0.8 for the 4.0 series.

 -Martin

 cmasmas cmasmas wrote:
 Hi,

 I'm trying to use Globus 4.0.1  with Ganglia IP. When I start the globus
 container I get the following error:

 2009-03-27 16:37:54,202 ERROR usefulrp.GLUEResourceProperty [GLUE
 refresher
 0,runScript:372] Could not deserialize output of producer
 org.globus.mds.usefulrp.glue.GangliaElementProducer to an instance of
 class
 org.globus.mds.glue.batchprovider.ClusterCollectionType

 I have read that the problem can be in the ganglia_to_glue.xslt. Can
 anyone
 help me with this?

 Thanks in advance.

Re: [gt-user] how to enable WS-GRAM-CONDOR

2009-03-26 Thread Martin Feller

Inderpreet Chopra wrote:
 I have globus GT4.0installed on my machine without any configuration for
 scheduler.
 But now i want to use condor scheduler. In the quickstart guide, it is
 mentioned that use *--enable-wsgramcondor* while configuring.
 
 *Does there any way to enable this option without affecting my present
 installation?*

No, it will effect your installation. You'd have to set up a second
installation if you don't want to modify your current one

 
 *Also from where can i get the instructions for further configuring condor
 scheduler with globus?*
 

There should not be any further configuration required. If submission
to condor does not work, you can check 
$GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager.
It's here where the condor-specific job description is created.

-Martin


 I have used the all-source installer for installing globus.
 
 
 Regards,
 Inderpreet

Re: [gt-user] No 'pem' file created after installing simpleCA !!

2009-03-26 Thread Martin Feller

Hm, how did you install simpleCA?
Did you follow 
http://www.globus.org/toolkit/docs/latest-stable/admin/quickstart/#q-security?
The CA should show up in ~/.globus/simpleCA/ and not in /globus/simpleCA.
Does it work if you follow the quickstart guide?

-Martin


Manisha Lakra wrote:
 Hello,
 
 I have successfully installed the globus toolkit 4.2.0 on first
 machine. After that I tried to install simpleCA on that machine, but no file
 with the extension of 'pem' is created in the folder '/globus/simpleCA'.
 Only the following files and directory are available:
 
 certs  crl  grid-ca-ssl.conf  index.txt  newcerts  private  serial
 
 Now, how could I proceed my installation. I tried to overwrite the existing
 simpleCA, by reinstalling it. Still the same thing happened, no file with '
 pem' extension. Kindly, guide me how to proceed now.
 
 Thank you
 Regards,
 Manisha Lakra

Re: [gt-user] Error in -start-container

2009-03-14 Thread Martin Feller

It's $GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml
Check the userName and password parameters in section dbConfiguration

-Martin

Danilo Delizia wrote:
 Hi,
 
 I'm trying to install globus toolkit 4.0.8 on kubuntu 8.10 i followed the 
 guide to install it and the simpleCA guide to configure the security system. 
 When i try to start the container i got this error:
 
 globus-start-container
 2009-03-14 23:30:31,592 ERROR service.ReliableFileTransferImpl 
 [main,init:76] Unable to setup database driver with pooling.A connection 
 error has occurred: FATAL:  password authentication failed for user danilo  
 
 
 2009-03-14 23:30:32,321 WARN  service.ReliableFileTransferHome 
 [main,initialize:97] All RFT requests will fail and all GRAM jobs that 
 require file staging will fail.A connection error has occurred: FATAL:  
 password authentication failed for user danilo  
 
 Starting SOAP server at: https://127.0.0.1:8443/wsrf/services/ 
 With the following services:   
 
 [1]: https://127.0.0.1:8443/wsrf/services/AdminService
 [2]: https://127.0.0.1:8443/wsrf/services/AuthzCalloutTestService
 [3]: https://127.0.0.1:8443/wsrf/services/CASService 
 [4]: https://127.0.0.1:8443/wsrf/services/ContainerRegistryEntryService
 [5]: https://127.0.0.1:8443/wsrf/services/ContainerRegistryService 
 [6]: https://127.0.0.1:8443/wsrf/services/CounterService   
 [7]: https://127.0.0.1:8443/wsrf/services/DefaultIndexService  
 [8]: https://127.0.0.1:8443/wsrf/services/DefaultIndexServiceEntry 
 [9]: https://127.0.0.1:8443/wsrf/services/DefaultTriggerService
 [10]: https://127.0.0.1:8443/wsrf/services/DefaultTriggerServiceEntry  
 [11]: https://127.0.0.1:8443/wsrf/services/DelegationFactoryService
 [12]: https://127.0.0.1:8443/wsrf/services/DelegationService   
 [13]: https://127.0.0.1:8443/wsrf/services/DelegationTestService   
 [14]: https://127.0.0.1:8443/wsrf/services/InMemoryServiceGroup
 [15]: https://127.0.0.1:8443/wsrf/services/InMemoryServiceGroupEntry   
 [16]: https://127.0.0.1:8443/wsrf/services/InMemoryServiceGroupFactory 
 [17]: https://127.0.0.1:8443/wsrf/services/IndexFactoryService 
 [18]: https://127.0.0.1:8443/wsrf/services/IndexService
 [19]: https://127.0.0.1:8443/wsrf/services/IndexServiceEntry   
 [20]: https://127.0.0.1:8443/wsrf/services/JWSCoreVersion  
 [21]: https://127.0.0.1:8443/wsrf/services/ManagedExecutableJobService 
 [22]: https://127.0.0.1:8443/wsrf/services/ManagedJobFactoryService
 [23]: https://127.0.0.1:8443/wsrf/services/ManagedMultiJobService  
 [24]: https://127.0.0.1:8443/wsrf/services/ManagementService   
 [25]: https://127.0.0.1:8443/wsrf/services/NotificationConsumerFactoryService
 [26]: https://127.0.0.1:8443/wsrf/services/NotificationConsumerService   
 [27]: https://127.0.0.1:8443/wsrf/services/NotificationTestService   
 [28]: https://127.0.0.1:8443/wsrf/services/PersistenceTestSubscriptionManager
 [29]: https://127.0.0.1:8443/wsrf/services/ReliableFileTransferFactoryService
 [30]: https://127.0.0.1:8443/wsrf/services/ReliableFileTransferService   
 [31]: https://127.0.0.1:8443/wsrf/services/RendezvousFactoryService  
 [32]: https://127.0.0.1:8443/wsrf/services/ReplicationService
 [33]: https://127.0.0.1:8443/wsrf/services/SampleAuthzService
 [34]: https://127.0.0.1:8443/wsrf/services/SecureCounterService  
 [35]: https://127.0.0.1:8443/wsrf/services/SecurityTestService   
 [36]: https://127.0.0.1:8443/wsrf/services/ShutdownService   
 [37]: https://127.0.0.1:8443/wsrf/services/SubscriptionManagerService
 [38]: https://127.0.0.1:8443/wsrf/services/TestAuthzService  
 [39]: https://127.0.0.1:8443/wsrf/services/TestRPCService
 [40]: https://127.0.0.1:8443/wsrf/services/TestService   
 [41]: https://127.0.0.1:8443/wsrf/services/TestServiceRequest
 [42]: https://127.0.0.1:8443/wsrf/services/TestServiceWrongWSDL  
 [43]: https://127.0.0.1:8443/wsrf/services/TriggerFactoryService 
 [44]: https://127.0.0.1:8443/wsrf/services/TriggerService
 [45]: https://127.0.0.1:8443/wsrf/services/TriggerServiceEntry   
 [46]: https://127.0.0.1:8443/wsrf/services/Version   
 [47]: https://127.0.0.1:8443/wsrf/services/WidgetNotificationService 
 [48]: https://127.0.0.1:8443/wsrf/services/WidgetService
 [49]: https://127.0.0.1:8443/wsrf/services/gsi/AuthenticationService
 [50]: https://127.0.0.1:8443/wsrf/services/mds/test/execsource/IndexService
 [51]: 
 https://127.0.0.1:8443/wsrf/services/mds/test/execsource/IndexServiceEntry
 [52]:

Re: [gt-user] Help: fileStageIn owner problem

2009-03-11 Thread Martin Feller

Does the same happen if you use globus-url-copy to transfer
a file, instead of using GridFTP via ws-gram?
(globus-url-copy \
 gsiftp://client.mydomain.com:2811/home/griduser1/grid/myhello \
 gsiftp://cm.mydomain.com:2811/tmp/myhello)

I assume so, and this would help narrowing it down.

-Martin

Le Trung Kien wrote:
 Hi,
 In my job description, I define a fileStageIn like this
 
 fileStageIn
  transfer
  sourceUrlgsiftp://
 client.mydomain.com:2811/home/griduser1/grid/myhello/sourceUrl
  destinationUrlgsiftp://cm.mydomain.com:2811/tmp/myhello
 /destinationUrl
  /transfer
 /fileStageIn
 
 After submitting my job, I got the file delivered, but it's strange that on
 cm.mydomain.com
 
 gridus...@cm #] ls -l /tmp/myhello
 -rwxr-xr-x1 root root  147 Mar  9 16:02 /tmp/myhello
 
 We see that this file is owned by root.
 In fact, with this problem I couldn't copy files and execute the files with
 right permission on my user's directories.
 Additional information :
 In my grid-mapfile, I have only one mapping from grid user to local user
 (this local user in my case is a NIS account).
 
 Help me, please.

Re: [gt-user] Failed to initialize GAHP

2009-03-10 Thread Martin Feller

AFAIK Gahp initialization pure Condor, so I think this
question is for the Condor group.

-Martin

Samir Khanal wrote:
 Hi All 
 I don;t know where to ask this question (Condor or Globus)
 
 I had setup a Globus_condor_g grid 
 
 A had the Gatekeeper and B had to submit jobs to A
 Everything was going smoothly and i could submit PBS/CONDOR both type of jobs
 
 Then i was asked to reverse the situation 
 B had to be the gatekeeper (as it had larger resources) and A had now to 
 submit jobs to B's resources.
 I used the GT4 quickstart guide and every setup went well, except now when i 
 submit Grid jobs via Condor-G
 The jobs get held.
 
 executable = /bin/date
 Transfer_Executable = false
 globusscheduler = B.xx.xx.xx/jobmanager-fork
 universe = grid
 output = date.out
 error=date.error
 log = date.log
 queue
 
 
 The same script worked the other way around
 The myproxy login and all other stuffs work, besides this problem.
 
 An i looked into the submit.log 
 it says
 
 012 (086.000.000) 03/05 18:28:53 Job was held.
 Failed to initialize GAHP
 Code 0 Subcode 0
 ...
  I then tried 
 
 [~]$ /opt/condor/sbin/gt4_gahp 
 $GahpVersion: 1.7.1 Apr 23 2008 GT4\ GAHP\ (GT-4.0.4) $
 
 
 and it does start (JAVA Is set up correctly)
 
 What seems to be the problem? i am a bit stuck with this.
 I am using Rocks 5.1, GT 4.2.1. Condor Roll that came with Rocks 5.1
 
 
 Thanks
 Samir

Re: [gt-user] globusrun-ws: Job failed: Staging error for RSL element fileStageIn

2009-03-10 Thread Martin Feller

I'm a bit confused about this error. It seems that RFT does not
find the delegated credential delegated by globusrun-ws.
Does the GT container logfile give more information?

Does the same happen if you do job delegation?
(globusrun-ws -submit -J -c /bin/date)

Does a job with streaming give the same error?
globusrun-ws -submit -s -c /bin/date

-Martin

Ritesh Badwaik wrote:
 I am using gt4.2.1
 
 Martin Feller wrote:
 Hi,

 What GT version is that?

 Martin

 Ritesh Badwaik wrote:
  
 hi,
 After giving the command  globusrun-ws -submit -S -f a.rsl 
 I am getting following error
 __


 Delegating user credentials...Done.
 Submitting job...Done.
 Job ID: uuid:5ffcc398-0878-11de-b882-0004796723fc
 Termination time: 03/04/3009 04:53 GMT
 Current job state: StageIn
 Current job state: Failed
 Destroying job...Done.
 Cleaning up any delegated credentials...Done.
 globusrun-ws: Job failed: Staging error for RSL element fileStageIn.
 Unable to create RFT resource; nested exception is:
org.globus.transfer.reliable.service.exception.RftException:
 Error processing delegated credentialError getting delegation resource
 [Caused by: org.globus.wsrf.NoSuchResourceException] [Caused by: Error
 getting delegation resource [Caused by:
 org.globus.wsrf.NoSuchResourceException]]
 globusrun-ws: Job failed: Staging error for RSL element fileCleanUp.
 Unable to create RFT resource; nested exception is:
org.globus.transfer.reliable.service.exception.RftException:
 Error processing delegated credentialError getting delegation resource
 [Caused by: org.globus.wsrf.NoSuchResourceException] [Caused by: Error
 getting delegation resource [Caused by:
 org.globus.wsrf.NoSuchResourceException]]
 __




 My rsl file is as follows
 --


 job
executablemy_echo/executable
directory${GLOBUS_USER_HOME}/directory
stdout${GLOBUS_USER_HOME}/stdout/stdout
stderr${GLOBUS_USER_HOME}/stderr/stderr
fileStageIn
transfer
  
 sourceUrlgsiftp://vsundar-fc8.corp.cdac.in:2811/home/ritesh/s/sourceUrl

  
 destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl
/transfer
/fileStageIn
fileCleanUp
deletion
filefile:///${GLOBUS_USER_HOME}/my_echo/file
/deletion
/fileCleanUp
 /job

 _
 I have attached the container.log file.
 vsundar-fc8.corp.cdac.in is the same machine on which I am submitting
 rsl file.
 Can anyone give me the solution for this error ?


 Thanks and Regards
 Ritesh

Re: [gt-user] gt-4.2.1

2009-03-04 Thread Martin Feller

Hm, something must go wrong with $PATH i think:
Can you actually call the condor commandline-tools as user globus?
What's the output of which condor_submit and echo $PATH?
What's the output of $GLOBUS_LOCATION/setup/globus/find-condor-tools

Martin

induru hemanth wrote:
 Martin Sir,
 
   Thanks for ur response
 
  ./configure --enable-wsgram-condor
  make
above commands are working fine
 
   but while running
 
 make install
its showing the error
 
 [glo...@g3 gt4.2.1-x86_64_rhas_4-installer]$ make install
 ln -sf /usr/local/globus-4.2.1.1/etc/gpt/packages
 /usr/local/globus-4.2.1.1/etc/globus_packages
 /usr/local/globus-4.2.1.1/sbin/gpt-postinstall
 running 
 /usr/local/globus-4.2.1.1/setup/globus/setup-globus-job-manager-condor.pl..[
 Changing to /usr/local/globus-4.2.1.1/setup/globus ]
 find-condor-tools: error: Cannot locate condor_submit
 checking for condor_submit... no
 Error locating condor commands, aborting!
 ERROR: Command failed
 make: *** [postinstall] Error 9
 [glo...@g3 gt4.2.1-x86_64_rhas_4-installer]$
 __
 
 [glo...@g3 gt4.2.1-x86_64_rhas_4-installer]$ export
 PATH=/home/condor/condor-7.2.0/sbin:/home/condor/condor-7.2.0/bin:$PATH
 
 after setting  PATH also its showing the same message
 
 __
 condor commands are working properly  while running as root , but
  condor_submit working with  condor user only
  _
 
 Please help me,
 
 
Hemanth,
 
BIT MESRA
 
 
 
 
 
 
 On 3/2/09, Martin Feller fel...@mcs.anl.gov wrote:
 If you didn't already build support for condor in ws-gram you have to do so:
 Go into the GT installer directory (source or binary installer) and do

   ./configure --enable-wsgram-condor(and whatever other options you
 provided)
   make
   make install

 After a GT server restart you should be able to submit jobs to Condor like

   globusrun-ws -submit -Ft Condor -c /bin/date

 If the job does not get through or keeps staying in state IDLE in Condor
 come back to the list.

 -Martin

 induru hemanth wrote:
 Hi

 I am using GT-4.2.1  (Defualt Resource manager -Fork)
 I Just installed and configured  condor-7.2.0



How i can acess Condor pool through Globus 4.2.1




  Thanking you,

 Hemanth,
 B.I.T Mesra.

Re: [gt-user] globusrun-ws: Job failed: Staging error for RSL element fileStageIn

2009-03-04 Thread Martin Feller

Hi,

What GT version is that?

Martin

Ritesh Badwaik wrote:
 hi,
 After giving the command  globusrun-ws -submit -S -f a.rsl 
 I am getting following error
 __
 
 Delegating user credentials...Done.
 Submitting job...Done.
 Job ID: uuid:5ffcc398-0878-11de-b882-0004796723fc
 Termination time: 03/04/3009 04:53 GMT
 Current job state: StageIn
 Current job state: Failed
 Destroying job...Done.
 Cleaning up any delegated credentials...Done.
 globusrun-ws: Job failed: Staging error for RSL element fileStageIn.
 Unable to create RFT resource; nested exception is:
org.globus.transfer.reliable.service.exception.RftException:
 Error processing delegated credentialError getting delegation resource
 [Caused by: org.globus.wsrf.NoSuchResourceException] [Caused by: Error
 getting delegation resource [Caused by:
 org.globus.wsrf.NoSuchResourceException]]
 globusrun-ws: Job failed: Staging error for RSL element fileCleanUp.
 Unable to create RFT resource; nested exception is:
org.globus.transfer.reliable.service.exception.RftException:
 Error processing delegated credentialError getting delegation resource
 [Caused by: org.globus.wsrf.NoSuchResourceException] [Caused by: Error
 getting delegation resource [Caused by:
 org.globus.wsrf.NoSuchResourceException]]
 __
 
 
 
 My rsl file is as follows
 --
 
 job
executablemy_echo/executable
directory${GLOBUS_USER_HOME}/directory
stdout${GLOBUS_USER_HOME}/stdout/stdout
stderr${GLOBUS_USER_HOME}/stderr/stderr
fileStageIn
transfer
   
 sourceUrlgsiftp://vsundar-fc8.corp.cdac.in:2811/home/ritesh/s/sourceUrl
   
 destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl
/transfer
/fileStageIn
fileCleanUp
deletion
filefile:///${GLOBUS_USER_HOME}/my_echo/file
/deletion
/fileCleanUp
 /job
 
 _
 I have attached the container.log file.
 vsundar-fc8.corp.cdac.in is the same machine on which I am submitting
 rsl file.
 Can anyone give me the solution for this error ?
 
 
 Thanks and Regards
 Ritesh

Re: [gt-user] gt-4.2.1

2009-03-02 Thread Martin Feller

If you didn't already build support for condor in ws-gram you have to do so:
Go into the GT installer directory (source or binary installer) and do

  ./configure --enable-wsgram-condor(and whatever other options you 
provided)
  make
  make install

After a GT server restart you should be able to submit jobs to Condor like

  globusrun-ws -submit -Ft Condor -c /bin/date

If the job does not get through or keeps staying in state IDLE in Condor
come back to the list.

-Martin

induru hemanth wrote:
 Hi
 
 I am using GT-4.2.1  (Defualt Resource manager -Fork)
 I Just installed and configured  condor-7.2.0
 
 
 
How i can acess Condor pool through Globus 4.2.1
 
 
 
 
  Thanking you,
 
 Hemanth,
 B.I.T Mesra.

Re: [gt-user] problem in transferring file

2009-02-28 Thread Martin Feller

In a job descriptions you can use 'file:///...' only for the GridFTP server
associated (or local) to the ws-gram server. file:/// will always be interpreted
as local to the ws-gram server.
That means:
For a fileStageIn element you can use it only in the destinationUrl element
and for a fileStageOut element you can use it only in the sourceUrl element.
In sourceUrl of a fileStageIn element and in destinationUrl of fileStageOut you
must provide 'gridftp urls'.
ws-gram will substitute 'file://' by 'gsiftp://gridftp-server:port',
according to
the gram-gridftp file system mappings defined by the admin.

Check
http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/gram4/admin/#gram4-nondefaultgridftp
and
http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/gram4/admin/#gram4-Interface_Config_Frag-filesysmap
for more detailed information about the gram-gridftp mappings.

If you want to transfer files as part of a job between 2 gridftp servers
that are completely unrelated to the ws-gram server, you can do so.
But you have to specify gridftp urls then.

Martin

Ufuk Utku Turuncoglu wrote:
Hi,

I change the order and i got following error,

at org.globus.exec.service.exec.RunThread.run(RunThread.java:85)
Can't do MLST on non-existing file/dir /Users/xyz/Desktop/dummy01.dat on
server fr0103ge.ncar.teragrid.org. Caused by
org.globus.ftp.exception.ServerException: Server refused performing the
request. Custom message: Server refused MLST command (error code 1)
[Nested exception message: Custom message: Unexpected reply:
500-Command failed : System error in stat: No such file or directory
500-A system call failed: No such file or directory
500-
500 End.]. Nested exception is
org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message:
Unexpected reply: 500-Command failed : System error in stat: No such
file or directory
500-A system call failed: No such file or directory
500-
500 End.

I don't understand. The file /Users/xyz/Desktop/dummy01.dat is in my
local machine. Why it try to find in remote server. I just want to copy
local file to remote server. I also try to copy file using globus-url-copy

globus-url-copy file:///Users/xyz/Desktop/dummy01.dat
gsiftp://gridftp.frost.ncar.teragrid.org/ptmp/xyz/dummy01.dat

and it works. The newer RSL file as,
...
fileStageOut
transfer
sourceUrlfile:///Users/xyz/Desktop/dummy01.dat/sourceUrl

destinationUrlgsiftp://gridftp.frost.ncar.teragrid.org/ptmp/xyz/dummy01.dat/destinationUrl

/transfer
/fileStageOut
...

Thanks,

--ufuk

Stuart Martin wrote:
It looks to me like you may have the source and dest mixed up. For
stage out, the source would typically be the file:/// url which would
get replaced by ws gram with the service-side gridftp server host and
port. Then the gsiftp url would be to the gridftp server running on
the client-side.

fileStageOut
transfer

sourceUrlgsiftp://gridftp.frost.ncar.teragrid.org/ptmp/xyz/dummy01.dat/sourceUrl

destinationUrlfile:///Users/xyz/Desktop/dummy01.dat/destinationUrl
/transfer
/fileStageOut

-Stu

On Feb 27, 2009, at Feb 27, 10:58 AM, Ufuk Utku Turuncoglu wrote:

Hi,

I try to submit a globus job with file transfer but i got following
error. The local file appears as null in the log.

Submission ID: uuid:af0a91f0-0494-11de-bdbe-fda4d1871b6e
delegation level: gsilimited
delegation level: gsifull
WAITING FOR JOB TO FINISH:
== State Notification ==
State : Failed
Holding: false

Exit Code: 0
Failed Failed
Fault:
fault type: org.globus.exec.generated.StagingFaultType:
attribute: fileStageOut
description:
Staging error for RSL element fileStageOut, from
gsiftp://gridftp.frost.ncar.teragrid.org:2811/ptmp/xyz/dummy01.dat to
null.
destination: null
faultReason:
faultString:
gt2ErrorCode: 0
originator: Address:
https://fr0103ge.ncar.teragrid.org:8443/wsrf/services/ManagedJobFactoryService

Reference property[0]:
ns6:ResourceID
xmlns:ns6=http://www.globus.org/namespaces/2004/10/gram/job;cd2d9100-0494-11de-97de-d94967efe41a/ns6:ResourceID

source:
gsiftp://gridftp.frost.ncar.teragrid.org:2811/ptmp/xyz/dummy01.dat
stackTrace:
org.globus.exec.generated.StagingFaultType: Staging error for RSL
element fileStageOut, from
gsiftp://gridftp.frost.ncar.teragrid.org:2811/ptmp/xyz/dummy01.dat to
null.
Timestamp: Thu Feb 26 23:06:41 MST 2009
Originator: Address:
https://fr0103ge.ncar.teragrid.org:8443/wsrf/services/ManagedJobFactoryService

Reference property[0]:
ns6:ResourceID
xmlns:ns6=http://www.globus.org/namespaces/2004/10/gram/job;cd2d9100-0494-11de-97de-d94967efe41a/ns6:ResourceID

I also check delegation level and it seems as full. I am using
gt4.0.8 java libraries. The jobs work correctly without data transfer
part.

Any suggestion will be helpful,

--ufuk

RSL Script ---

?xml version=1.0 encoding=UTF-8?
job

Re: [gt-user] Possible container bug

2009-02-24 Thread Martin Feller

I think i see the problem:
http://bugzilla.globus.org/globus/show_bug.cgi?id=6350

I opened that bug a while ago but didn't get to it yet.
Looks like it's time for that now.
If i prepare a fix: Can you try it out?

Martin

Kay Dörnemann wrote:
 Hi,
 
 you will find attached the full container.log from today. The problem
 occurred fast today. I guess it was between 3 pm and 11 pm (CET).
 Thanks.
 
 Cheers,
 
 Kay
 
 Martin Feller wrote, on 21.02.2009 20:52:
 What about all this: 
 http://lists.globus.org/pipermail/gt-user/2009-February/007772.html ?

 Kay Dörnemann wrote:
 Hi,

 we tried dumping the RFT database but within one day suddenly the CPU
 usage of the globus process jumped again to 100%. As usual. Anyone an
 idea?
 Thank you.

 Cheers,

 Kay

 Patrick Armstrong schrieb:
 I realize it has been almost a month since your post in reply to
 gt-user, but are you still having the problem described?
 Martin Feller, (who fixed this bug) suggested some things in gt-user on
 the 6th, specifically deleting your persisted directory. I've also found
 that dumping your rft database also helps.
 --patrick

Re: [gt-user] is running a container for both globus 4.2 and 4.0 possible?

2009-02-24 Thread Martin Feller

You can run more than one container on one machine - i do it all the time.
AFAIK the installations just have to be located in different directories.

Say, you have two gt installs: /opt/gt408 and /opt/gt421.
I personally then have ~/.bashrc408 and a ~/.bashrc421, setting up paths,
GLOBUS_LOCATION (and maybe CLASSPATH) for the different gt installs.
Corresponding to each bashrc file i have an alias which sources the
appropriate bashrc file:
alias 408='cp ~/.bashrc408 ~/.bashrc  source ~/.bashrc'
alias 421='cp ~/.bashrc421 ~/.bashrc  source ~/.bashrc'

Switching context you can easily start different containers, they have to
listen on different ports though.

Not sure if this is the smartest way, but it works for me.

-Martin

Cole Uhlman wrote:
 Hello, all.
 
 I would like to be able to set up machines to be able to accept jobs
 from either globus 4.2 or 4.0.  Globus doesn't want me running two
 containers (if i try to run the second, ERROR:  A container with pid
 2177 is already running)
 
 On a machine with both installed, would it even be theoretically
 possible to run two containers?  Could there be another way for one
 machine to serve both 4.2 and 4.0?
 
 Thanks.
 
 -Cole

Re: [gt-user] is running a container for both globus 4.2 and 4.0 possible?

2009-02-24 Thread Martin Feller

Alexander Beck-Ratzka wrote:
 On Wednesday, February 25th 2009 07:04:42 Martin Feller wrote:
 You can run more than one container on one machine - i do it all the time.
 AFAIK the installations just have to be located in different directories.

 Say, you have two gt installs: /opt/gt408 and /opt/gt421.
 I personally then have ~/.bashrc408 and a ~/.bashrc421, setting up paths,
 GLOBUS_LOCATION (and maybe CLASSPATH) for the different gt installs.
 Corresponding to each bashrc file i have an alias which sources the
 appropriate bashrc file:
 alias 408='cp ~/.bashrc408 ~/.bashrc  source ~/.bashrc'
 alias 421='cp ~/.bashrc421 ~/.bashrc  source ~/.bashrc'

 Switching context you can easily start different containers, they have to
 listen on different ports though.

 Not sure if this is the smartest way, but it works for me.

 
 I am not sure if this will work by putting globus 4.0 and 4.2 in different 
 directories. The wsgram service is creating a listening port, namely 8443. If 
 this is really a listening port, the second wsgram service won't come up, 
 because it will try to open the same listening port. This will lead to a 
 Unix / Linux system error.
 
 Therefore I think you also need to change those ports in the configuration 
 files for the second container.
 
 Cheers
 
 Alexander


I think that's what i wanted to say by they have to listen on different ports 
though
Or do you mean something else here?

Martin

1 2 >

1 - 100 of 121 matches

Mail list logo