Re: [gt-user] Loosing job states
Quick note on the last paragraph: We don't use a SEG because it proved to be unreliable. On 24/09/13 8:26 AM, Markus Binsteiner wrote: Hi Joe, thanks for your thoughts on this, I'll try to get more info from the logs. Although the problem now went away almost totally. We had problems with LoadLeveler over the last few days, but once we figured out what the problem was and worked around it those grid-related issues went away too. It was just a bit strange because globus seemed to have lost jobs at a much higher rate than our 'normal' LoadLeveler users (of whom we have way more). My current working theory is that globus tried to check the status more often than a normal user would and therefor was much more likely to find it in a broken state. Once that happened it considered the job as gone and deleted those job files. Does that sound possible to you? Best, Markus On Thu, 2013-09-19 at 11:51 -0400, Joseph Bester wrote: That's normally not deleted until the job is completed and the two-phase commit is done. The other reason why GRAM might delete it would be if the job expires (after it hits an end state and hasn't been touched in 4 hours). Is there a possibility of something else cleaning out that directory? Do those files exist? It's possible to increase the logging level as described here: http://www.globus.org/toolkit/docs/5.2/5.2.4/gram5/admin/#idp7912160 which might give some info about what the job manager thinks is going on. Joe On Sep 18, 2013, at 3:33 PM, Markus Binsteiner m.binstei...@auckland.ac.nz wrote: Hi. We are experiencing a mayor problems with loosing job states, after a while (an hour or so) every job we submit via globus ends up in an unknown state. I'm not quite sure where to start looking, the logs say: ts=2013-09-18T19:20:31.006776Z id=14670 event=gram.state_file_read.end level=ERROR gramid=/16361930530915519966/6437524403105335712/ path=/var/lib/globus/gram_job_state/mbin029/16966e4/loadleveler/job.16361930530915519966.6437524403105335712 msg=Error checking file status status=-121 errno=2 reason=No such file or directory everytime another status is lost. We are using jglobus (1.8.x), two-phase commit and we poll the LRM (LoadLeveler -- not using scheduler event generator). Any idea what could cause those files to be deleted? Best, Markus -- Martin Feller Centre for eResearch, The University of Auckland 24 Symonds Street, Building 409, Room G21 e: m.fel...@auckland.ac.nz p: +64 9 3737599 ext 82099
Re: [gt-user] problems with too many open files
Hi Brian, I think I'd try to convince the IT group to authorize the upgrade to GT 5.2. According to http://www.globus.org/toolkit/docs/5.2/5.2.0/gram5/rn/#gram5-fixed the issue with accumulating open files (http://jira.globus.org/browse/GRAM-223) was fixed in the 5.2 series. We had the same problem with 5.0.4, and it works fine for us with 5.2. Increasing the values will definitely help, but, depending on the activity of users, may just delay the problem. Martin On 23/01/12 6:47 PM, Yuriy Halytskyy wrote: Hi Brian, Have a look at http://technical.bestgrid.org/index.php/Setup_GRAM5_on_CentOS_5#Increase_Open_Files_Limit Cheers, Yuriy On 23/01/12 18:45, Brian O'Connor wrote: Hi, I've been using GRAM for a long time now and I'd like to push it into production but I'm having issues with it. I submit workflows of hundreds of jobs each day through an automated submitter so I need to be able to send jobs to a GRAM server and not have it get in a bad state after x number of days. That's the goal at least... Anyway, the latest problem I've had is with GRAM rejecting incoming requests because of Too many open files Here's the error: globus-job-run server.domain.name/jobmanager-sge /bin/hostname GRAM Job submission failed because Error opening proxy file for writing: /u/seqware/.globus/job/sqwprod.hpc.oicr.on.ca/16217884770066032596.5836665131371726474/x509_user_proxy: Too many open files (24) (error code 75) I checked my proxy and it looks OK: grid-proxy-info subject : /O=Grid/OU=GlobusTest/OU=simpleCA-sqwstage.hpc.oicr.on.ca/OU=hpc.oicr.on.ca/CN=Seq Ware/CN=1800547271 issuer : /O=Grid/OU=GlobusTest/OU=simpleCA-sqwstage.hpc.oicr.on.ca/OU=hpc.oicr.on.ca/CN=Seq Ware identity : /O=Grid/OU=GlobusTest/OU=simpleCA-sqwstage.hpc.oicr.on.ca/OU=hpc.oicr.on.ca/CN=Seq Ware type : RFC 3820 compliant impersonation proxy strength : 512 bits path : /tmp/x509up_u1373 timeleft : 479:16:03 (20.0 days) I then looked at the number of open files for this user: /usr/sbin/lsof | grep seqware | wc -l 2084 Looking at the globus-job-manager it's using up the majority: ps aux | grep globus-job-man seqware 175028 0.0 0.0 61200 768 pts/2R+ 00:21 0:00 grep globus-job-man seqware 4103600 0.1 0.4 116984 18628 ?SJan22 1:26 globus-job-manager -conf /usr/local/globus/default/etc/globus-job-manager.conf -type sge seqware 4103647 0.0 0.1 36548 7440 ?SJan22 1:00 perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m sge -c interactive seqware 4103649 0.0 0.1 36548 7456 ?SJan22 0:59 perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m sge -c interactive seqware 4103650 0.0 0.1 36548 7440 ?SJan22 0:59 perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m sge -c interactive seqware 4103651 0.0 0.1 36548 7444 ?SJan22 0:59 perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m sge -c interactive seqware 4103652 0.0 0.1 36544 7436 ?SJan22 0:59 perl /usr/local/globus/5.0.2/libexec/globus-job-manager-script.pl -m sge -c interactive /usr/sbin/lsof | grep seqware | grep 4103600 | wc -l 1069 However if I look at this users limits it looks like they can open up to 32768 files and I can perform other file operations just fine. ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 69632 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 69632 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Does anyone know why this is happening? To date I've been killing the globus-job-manager when things like this happen. Is there a guide somewhere that describes the right way to reset the daemons if something goes wrong? Is there a guide for avoiding common pitfalls and setting up GRAM (in particular) to work in a heavily used grid install? I want to be able to push thousands of jobs through the system but so far it seems to barf on me every few days which has caused a lot of disruption in our workflows. I'm currently using 5.0.2, I would like to upgrade but it requires the IT group to authorize this. Here's my configuration for the gatekeeper: service gsigatekeeper { socket_type = stream wait = no user = root server = /usr/local/globus/default/sbin/globus-gatekeeper server_args = -conf /usr/local/globus/default/etc/globus-gatekeeper.conf
Re: [gt-user] gram5 job manager and open files with jglobus
For the records: According to http://www.globus.org/toolkit/docs/5.2/5.2.0/gram5/rn/#gram5-fixed it's a bug and fixed in 5.2... On 10/12/11 12:12 PM, Martin Feller wrote: Hi Globus team, GT version: 5.0.4 jglobus version: 1.8.0 When I submit a job using jglobus' Gram API, the globus-job-manager processes seem to accumulate open files. It looks like, for each job, the job manager opens files for the stdout and stderr params in the job description (or /dev/null if not specified) and doesn't close them, even if the job is done, until the job manager shuts down. This becomes problematic if you have active users submitting a lot of jobs. Any ideas what would cause the job manager to keep these files open? NOTE: The job manager does not accumulate open files if I submit a job using the C client globusrun. I attached output from lsof as example after submitting 5 sequential jobs. stdout and stderr params of the jobs were /home/testuser/[out|err]_UUID. The jobs were simple /bin/hostname jobs and done when I took the lsof snapshot. Thanks! Martin
[gt-user] gram5 job manager and open files with jglobus
Hi Globus team, GT version: 5.0.4 jglobus version: 1.8.0 When I submit a job using jglobus' Gram API, the globus-job-manager processes seem to accumulate open files. It looks like, for each job, the job manager opens files for the stdout and stderr params in the job description (or /dev/null if not specified) and doesn't close them, even if the job is done, until the job manager shuts down. This becomes problematic if you have active users submitting a lot of jobs. Any ideas what would cause the job manager to keep these files open? NOTE: The job manager does not accumulate open files if I submit a job using the C client globusrun. I attached output from lsof as example after submitting 5 sequential jobs. stdout and stderr params of the jobs were /home/testuser/[out|err]_UUID. The jobs were simple /bin/hostname jobs and done when I took the lsof snapshot. Thanks! Martin [root@er161-40 tmp]# lsof | grep testuser | grep home globus-jo 23152 testuser cwd DIR 253,0 4096 76117 /home/testuser globus-jo 23153 testuser cwd DIR 253,0 4096 76117 /home/testuser globus-jo 23153 testuser6u unix 0x81000bc97980 5971386 /home/testuser/.globus/job/er161-40.ceres.auckland.ac.nz/fork.6bd321d6.sock globus-jo 23153 testuser8w REG 253,0 1801895 846923 /home/testuser/gram_20111209.log globus-jo 23153 testuser 11uW REG 253,00 585356 /home/testuser/.globus/job/er161-40.ceres.auckland.ac.nz/fork.6bd321d6.lock globus-jo 23153 testuser 17w REG 253,0 30 846966 /home/testuser/out_10f827cf-4f1b-4422-b745-40e8b200d524 globus-jo 23153 testuser 18w REG 253,00 846967 /home/testuser/err_10f827cf-4f1b-4422-b745-40e8b200d524 globus-jo 23153 testuser 22w REG 253,0 30 846968 /home/testuser/out_2cac8de0-2f2e-47df-8cab-eb26c9406334 globus-jo 23153 testuser 23w REG 253,00 846969 /home/testuser/err_2cac8de0-2f2e-47df-8cab-eb26c9406334 globus-jo 23153 testuser 24w REG 253,0 30 846971 /home/testuser/out_11f03168-5616-45b9-bd53-6436bc00b75e globus-jo 23153 testuser 25w REG 253,00 846972 /home/testuser/err_11f03168-5616-45b9-bd53-6436bc00b75e globus-jo 23153 testuser 26w REG 253,0 30 846973 /home/testuser/out_abb4c0d8-f1ea-4e04-88a9-b2ba1e7f5b05 globus-jo 23153 testuser 27w REG 253,00 846974 /home/testuser/err_abb4c0d8-f1ea-4e04-88a9-b2ba1e7f5b05 globus-jo 23153 testuser 28w REG 253,0 30 846976 /home/testuser/out_a8d9598b-7f2d-41f5-9fbb-929b62e302f3 globus-jo 23153 testuser 29w REG 253,00 846977 /home/testuser/err_a8d9598b-7f2d-41f5-9fbb-929b62e302f3 perl 23155 testuser cwd DIR 253,0 4096 76117 /home/testuser perl 23155 testuser8w REG 253,0 1801895 846923 /home/testuser/gram_20111209.log perl 23155 testuser 17w REG 253,0 30 846966 /home/testuser/out_10f827cf-4f1b-4422-b745-40e8b200d524 perl 23155 testuser 18w REG 253,00 846967 /home/testuser/err_10f827cf-4f1b-4422-b745-40e8b200d524
Re: [gt-user] problem in installing globus-4.2.1 -- the secondnode
Is my_echo a shell script? If so, I remember an issue with shell scripts that lack the interpreting shell in the first line, like #!/bin/sh If it's a shell script, make sure to add that line. -Martin On 4/11/11 1:21 PM, Christopher Kunz wrote: Am 11.04.2011 20:04, schrieb Prashanth Chengi: The good news is that the rsl file is ok. The bad news is that the problem lies elsewhere. Can you do a simple globus-url-copy successfully? Throw in the -dbg flag too, to get additional info. Or check via non-grid means (SSH) on the other node if ${GLOBUS_USER_HOME}/my_echo exists and is executable (mode 0755 or similar). --ck
Re: [gt-user] Globus installation error
Is there a particular reason why you use such an old version of the GT (4.0.1)? If you have to use a version from the 4.0 series I'd rather try 4.0.8. -Martin On 2/9/11 12:02 PM, kasim saeed wrote: Yes, i did check all the pre-requisites , I had just checked it again and g++ is installed. Regards Kaasim Saeed. On Wed, Feb 9, 2011 at 10:58 PM, Roy, Kevin (LNG-SEA) kevin@applieddiscovery.com mailto:kevin@applieddiscovery.com wrote: Did you run through the quickstart and verify that you had all the needed software? I need to download g++ for my machines, it looks like that might be your problem. *From:*gt-user-boun...@lists.globus.org mailto:gt-user-boun...@lists.globus.org [mailto:gt-user-boun...@lists.globus.org mailto:gt-user-boun...@lists.globus.org] *On Behalf Of *kasim saeed *Sent:* Wednesday, February 09, 2011 9:54 AM *To:* gt-user@lists.globus.org mailto:gt-user@lists.globus.org; Lukasz Lacinski *Cc:* Dr. Farrukh Nadeem *Subject:* [gt-user] Globus installation error Hi all I am new to Globus and need it to install for my academic purposes. I am using http://globus.org/toolkit/docs/4.0/admin/docbook/quickstart.html link for installation. OS in Ubuntu 10.04 Globus version is 4.0.1. All went well except when i gave the command make | tee installer.log the following error appears. /usr/local/globus4a//sbin/gpt-build -srcdir=source-trees-thr/core/source gcc32dbgpthr sh: NOT: not found /usr/local/globus4a//etc/gpt/globus_core-src.tar.gz could not be untarred:512 Died at /usr/local/globus4a//lib/perl/Grid/GPT/PkgMngmt/ExpandSource.pm line 42. make: *** [globus_core-thr] Error 2 Please help Regards Kaasim Saeed.
Re: [gt-user] Stripe mode over multiple links between two servers
The CA itself should stay on one machine and should not be copied to multiple nodes in a grid. It's probably only located on the first machine in your case. Does it work if you copy the host certificate request from the second machine to the first machine, sign it there, and copy the generated certificate back to the second machine, where the corresponding private key of the host certificate lives? Martin Hoot Thompson wrote: I'm back again. Can you point me to a good resource for setting up a simpleCA for two test machines. Things go ok on the first machine but I'm getting stuck trying to sign the host certificate on the second machine. I'm using the GT 5.0.2 SimpleCA: Admin Guide as a reference. Error message is as follows. [h...@i7test4 mailto:h...@i7test4 globus_simple_ca_264a619f_setup]$ $GLOBUS_LOCATION/bin/grid-ca-sign -in /me/hoot/wideband_tools/gridftp/globus/etc/hostcert_request.pem -out $GLOBUS_LOCATION/hostsigned.pem ERROR: No simple CA directory found at /me/hoot/.globus/simpleCA/ Either specify a directory with -dir, or run setup-simple-ca to create a CA -Original Message- *From*: Chandin Wilson chandin.wil...@noaa.gov mailto:chandin%20wilson%20%3cchandin.wil...@noaa.gov%3e *To*: h...@ptpnow.com mailto:h...@ptpnow.com *Cc*: gt-user@lists.globus.org mailto:gt-user@lists.globus.org *Subject*: Re: [gt-user] Stripe mode over multiple links between two servers *Date*: Tue, 24 Aug 2010 14:48:48 -0500 (CDT) From: Hoot Thompson h...@ptpnow.com mailto:h...@ptpnow.com Subject: RE: [gt-user] Stripe mode over multiple links between two servers Date: Tue, 24 Aug 2010 14:58:39 -0400 Ok. Just to repeat in my own words, two servers with two interfaces each can be striped if GSI is use. Yes. I'd expect you'd end up running three GridFTP instances per server, one master and two data movers, each bound to a seperate data interface. Might want to make sure your filesystem and backend I/O can keep up with and sustain 20Gbit/sec. --Chan Hoot -Original Message- From: Chandin Wilson [mailto:chandin.wil...@noaa.gov] Sent: Tuesday, August 24, 2010 2:48 PM To: h...@ptpnow.com mailto:h...@ptpnow.com Cc: gt-user@lists.globus.org mailto:gt-user@lists.globus.org Subject: Re: [gt-user] Stripe mode over multiple links between two servers From: Hoot Thompson h...@ptpnow.com mailto:h...@ptpnow.com Subject: [gt-user] Stripe mode over multiple links between two servers Date: Tue, 24 Aug 2010 14:03:39 -0400 I have two servers, each with two 10GigE links and I would like to stripe a file across the two links. I'm currently authenticating using ssh. Can I do this using the gridftp server stripe mode and if so, how do I set it up? No, you cannot. You must use GSI authentication (and hence gsiftp:// style URLs) to do striped (data movers) GridFTP transfers. --Chan Chandin Wilson, General Specialist, Information technology. chandin.wil...@noaa.gov mailto:chandin.wil...@noaa.gov +1-608-216-5689 OneNOAA RDHPCS Infrastructure Thanks!
Re: [gt-user] Stop a nuclear disaster
Tell him to upgrade to Gram5. Maybe that'll change his mind. jayakan...@gmail.com wrote: Hi , Our leaders are putting the nation at stake for the nuclear liability bill. The Standing Committee looking at the bill has submitted its recommendations to the Parliament. In its current form the bill limits the liability for operator of the nuclear facility in case of a nuclear accident. If the cost exceeds the limit we will have to pay for it. The Standing Committee has ignored the demand for unlimited liability which would have made the bill more competent. Our leaders have not learnt anything from the injustices of Bhopal. Prime Minister Manmohan Singh, eager to get this bill cleared needs to know that we want unlimited liability. I have already sent him an email asking him to incorporate unlimited liability in the bill. A large number of emails demanding the same will make it difficult for him to ignore us. We have very little time to make this change. Can you also write to PM Manmohan Singh asking him to incorporate unlimited liability? http://www.greenpeace.org/india/unlimited-liability Thanks! jayakan...@gmail.com You are receiving this email because someone you know sent it to you from the Greenpeace site. Greenpeace retains no information about individuals contacted through its site, and will not send you further messages without your consent -- although your friends could, of course, send you another message.
Re: [gt-user] globus-ws with lsf does not work
Ok, that's odd. Right now I don't have an idea what might go wrong. If you have full control over the GT server, and it's not a production system, please do this: 0. Uncomment the following line in $GLOBUS_LOCATION/container-log4j.properties # log4j.category.org.globus=DEBUG 1. Shutdown the server 2. Remove the server logfile $GLOBUS_LOCATION/var/container.log 3. Remove the persistence directory ~userWhoStartsTheContainer/.globus/persisted 4. Restart the GT server as a daemon (globus-start-container-detached) 5. Submit a simple batch job. No staging, no fileCleanUp please, just something simple like globusrun-ws -submit -c /bin/date 6. Save the server logfile $GLOBUS_LOCATION/var/container.log Please do steps 1-6 for both a Fork and an LSF job, and send both log files. Martin Löhnhardt, Benjamin wrote: Hi Martin, Ah, why to the easy route if there is a complicated one... Somehow I was focused on your statement ... new LSF ... and thought it used to work with old LSF or Fork. So maybe this: It works fine with Fork. The old LSF is not installed anymore so I cannot test it. As both variants (Fork and LSF) use the same notification listener (I guess?), network configuration problems may not be the reason... To verify that: submit a job in batch/non-interactive mode and store the EPR of the job. Then poll for status. With LSF the job status remains unsubmitted: -bash-3.1$ globusrun-ws -submit -b -o job.epr -F https://nimrod.med.uni-goettingen.de -Ft LSF -c /bin/date Submitting job...Done. Job ID: uuid:7e99a622-a061-11df-9d58-00215af48192 Termination time: 08/06/2010 07:17 GMT -bash-3.1$ globusrun-ws -status -j job.epr Current job state: Unsubmitted ...but with Fork it is done. -bash-3.1$ globusrun-ws -submit -b -o job.epr -F https://nimrod.med.uni-goettingen.de -Ft Fork -c /bin/date Submitting job...Done. Job ID: uuid:8b85d1b2-a061-11df-9fb3-00215af48192 Termination time: 08/06/2010 07:17 GMT -bash-3.1$ globusrun-ws -status -j job.epr Current job state: Done Do you have an explanation for that strange behavior? Regards, Benjamin -- Benjamin Löhnhardt UNIVERSITÄTSMEDIZIN GÖTTINGEN GEORG-AUGUST-UNIVERSITÄT Abteilung Medizinische Informatik Robert-Koch-Straße 40 37075 Göttingen Briefpost 37099 Göttingen Telefon +49-551 / 39-22842 benjamin.loehnha...@med.uni-goettingen.de www.mi.med.uni-goettingen.de
Re: [gt-user] globus-ws with lsf does not work
Hm, it's very strange. In the logfile for the LSF job I can see that no single message of the LSF SEG ever enters the Java code, which explains what we see, but I don't know why this happens. It's just like either the LSF SEG died (we should see an error in the log in that situation, though, but there is no error), or the thread that runs the java code that communicates with the SEG (which is named SchedulerEventGenerator too) died (we should see an error in the log too, but there is nothing) (You can turn ws-gram debugging in the server off again) Please start the server and submit an LSF job. Then please paste information about the processes of the GT server and the SEGs (ps -ef | grep -i globus | grep -v grep should give you that) Please send me a thread dump of the GT server process. (kill -QUIT server-pid. The output is stored in the server logfile) Please send the output of ldd $GLOBUS_LOCATION/libexec/globus-scheduler-event-generator I hope this will tell me more. Just to make sure: When you started the SEG manually and saw it printing output, you used the SEG from the same $GLOBUS_LOCATION that is used by the GT4 server we talk about, right? There is not accidentally another GLOBUS_LOCATION around that might cause some confusion? I remember vaguely that there once was a situation with 2 globus installations where the SEG didn't report anything but I don't remember any details... Maybe worth trying a clean re-install. Should be relatively quick to do with a binary installer. Thanks, Martin Löhnhardt, Benjamin wrote: Hi Martin, I think I should not send 5MB to the mailing list. So just for you the two resulting container.log. Regards, Benjamin -- Benjamin Löhnhardt UNIVERSITÄTSMEDIZIN GÖTTINGEN GEORG-AUGUST-UNIVERSITÄT Abteilung Medizinische Informatik Robert-Koch-Straße 40 37075 Göttingen Briefpost 37099 Göttingen Telefon +49-551 / 39-22842 benjamin.loehnha...@med.uni-goettingen.de www.mi.med.uni-goettingen.de -Ursprüngliche Nachricht- Von: Martin Feller [mailto:fel...@mcs.anl.gov] Gesendet: Donnerstag, 5. August 2010 14:39 An: Löhnhardt, Benjamin Cc: gt-u...@globus.org Betreff: Re: AW: [gt-user] globus-ws with lsf does not work Ok, that's odd. Right now I don't have an idea what might go wrong. If you have full control over the GT server, and it's not a production system, please do this: 0. Uncomment the following line in $GLOBUS_LOCATION/container- log4j.properties # log4j.category.org.globus=DEBUG 1. Shutdown the server 2. Remove the server logfile $GLOBUS_LOCATION/var/container.log 3. Remove the persistence directory ~userWhoStartsTheContainer/.globus/persisted 4. Restart the GT server as a daemon (globus-start-container-detached) 5. Submit a simple batch job. No staging, no fileCleanUp please, just something simple like globusrun-ws -submit -c /bin/date 6. Save the server logfile $GLOBUS_LOCATION/var/container.log Please do steps 1-6 for both a Fork and an LSF job, and send both log files. Martin Löhnhardt, Benjamin wrote: Hi Martin, Ah, why to the easy route if there is a complicated one... Somehow I was focused on your statement ... new LSF ... and thought it used to work with old LSF or Fork. So maybe this: It works fine with Fork. The old LSF is not installed anymore so I cannot test it. As both variants (Fork and LSF) use the same notification listener (I guess?), network configuration problems may not be the reason... To verify that: submit a job in batch/non-interactive mode and store the EPR of the job. Then poll for status. With LSF the job status remains unsubmitted: -bash-3.1$ globusrun-ws -submit -b -o job.epr -F https://nimrod.med.uni-goettingen.de -Ft LSF -c /bin/date Submitting job...Done. Job ID: uuid:7e99a622-a061-11df-9d58-00215af48192 Termination time: 08/06/2010 07:17 GMT -bash-3.1$ globusrun-ws -status -j job.epr Current job state: Unsubmitted ...but with Fork it is done. -bash-3.1$ globusrun-ws -submit -b -o job.epr -F https://nimrod.med.uni-goettingen.de -Ft Fork -c /bin/date Submitting job...Done. Job ID: uuid:8b85d1b2-a061-11df-9fb3-00215af48192 Termination time: 08/06/2010 07:17 GMT -bash-3.1$ globusrun-ws -status -j job.epr Current job state: Done Do you have an explanation for that strange behavior? Regards, Benjamin -- Benjamin Löhnhardt UNIVERSITÄTSMEDIZIN GÖTTINGEN GEORG-AUGUST-UNIVERSITÄT Abteilung Medizinische Informatik Robert-Koch-Straße 40 37075 Göttingen Briefpost 37099 Göttingen Telefon +49-551 / 39-22842 benjamin.loehnha...@med.uni-goettingen.de www.mi.med.uni-goettingen.de
Re: [gt-user] globus-ws with lsf does not work
Löhnhardt, Benjamin wrote: Hi Martin, thanks for your prompt response! I suppose, that you are right with your guess. Maybe the SEG does not work correctly... In the meanwhile I have tested the following: After submitting a job via globus-ws, the job will be executed by lsf. Afterwards there is an entry in the logfile of lsf (/opt/hptc/lsf/top/work/hptclsf/logdir/lsb.acct). This entry seems to be equal (or similar) to the old lsf logfile. Mainly the version number of lsf is different. In /opt/globus/gt4/etc/globus-lsf.conf the right location of the lsf logfile is given: log_path=/opt/hptc/lsf/top/work/hptclsf/logdir. Good point. If the log_path pointed to a wrong location, that would have been a good explanation... How can I run the SEG manually? I have tried /opt/globus/gt4/libexec/globus-scheduler-event-generator -s LSF, but without success and with an error message: globus_scheduler_event_generator: Unable to dlopen module /opt/globus/gt4/lib/libglobus_seg_LSF_gcc64dbg.la: file not found 2 things you can check: 1. Make sure /opt/globus/gt4/lib is set in your library search path environment variable (The name of this variable depends on the system, but in most cases it's $LD_LIBRARY_PATH) 2. I think you need to type LSF lower-case. Try /opt/globus/gt4/libexec/globus-scheduler-event-generator -s lsf (The library is very probably named libglobus_seg_lsf_gcc64dbg.la and not libglobus_seg_LSF_gcc64dbg.la) You might see a lot of output when you start that command. Wait until the output stops. Then submit a job to LSF via WS-GRAM. You should then see something like 001;1280934384;d96c164e-9fd9-11df-a8d3-0013d4c3b957:26714;2;0 001;1280934384;d96c164e-9fd9-11df-a8d3-0013d4c3b957:26714;8;0 on the console when you submit a job to LSF vi WS-GRAM (provided the SEG works properly). If the SEG doesn't spit out anything and the job is done in LSF, then something is wrong. In that case, submit a job to the scheduler 'fork' and see if the fork SEG works ok, just to make sure you did the right things with the LSF SEG. (Start the fork seg by 'globus-scheduler-event-generator -s fork' and submit a 'fork' job via WS-GRAM) Martin Btw: We use Globus 4.0.8 (I did not mentioned it in the last post. Regards, Benjamin -Ursprüngliche Nachricht- Von: Martin Feller [mailto:fel...@mcs.anl.gov] Gesendet: Freitag, 30. Juli 2010 14:50 An: Löhnhardt, Benjamin Cc: gt-u...@globus.org Betreff: Re: [gt-user] globus-ws with lsf does not work Hi, Just an educated guess: I assume the problem is the scheduler event generator (SEG). In Gram4, and I think also in Gram5, the SEG is responsible for telling Gram about the status of the jobs in the job manager. If the SEG doesn't tell Gram about job status, the job doesn't make any progress from Gram's perspective. I think the SEG works on the log files of the job managers to get the status information about the jobs. If something changed in the logging format of the job manager, the SEG may not be able to get the information anymore. To confirm this I would probably run the SEG by hand and submit a Gram job to old/new LSF and check if the SEG actually spits out information on job status as it is processed in the job manager. Martin Löhnhardt, Benjamin wrote: Hello, we have a problem with globus and LSF as job manager. When we execute a globus job via globus-ws on a client, on the server the job will be handled by the lsf job manager normally. The job even will be executed by lsf itself, so that the test script was run on the server. However, the client does not notice that the script was run successfully and waits. The output on the client: -bash-3.1$ globusrun-ws -submit -s -F https://nimrod.med.uni- goettingen.de -Ft LSF -c /tmp/test.sh Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:c06227cc-9bc6-11df-80d4-00215af48192 Termination time: 07/31/2010 10:39 GMT We have updated the lsf system on the server from 6.2 to 7.0. Has anybody a hint why the client is waiting of a response by the server? How can we fix this issue? Regards, Benjamin
Re: [gt-user] globus-ws with lsf does not work
Martin Feller wrote: Löhnhardt, Benjamin wrote: 1. Make sure /opt/globus/gt4/lib is set in your library search path environment /opt/globus/gt4/lib is set in $LD_LIBRARY_PATH. 2. I think you need to type LSF lower-case. Try Lower case works :-) After submitting a job the output is: 001;1280936535;25852;1;0 001;1280936538;25852;2;0 001;1280936570;25852;8;0 Ok, this looks good. Have you got any ideas why the messages does not reach the client? Can network configuration problems (firewall) cause it. Do you know which port(s) are used for the event to the client? Ah, why to the easy route if there is a complicated one... Somehow I was focused on your statement ... new LSF ... and thought it used to work with old LSF or Fork. So maybe this: If your client is behind a firewall, then it's a pretty common thing that notification messages are blocked. globusrun-ws, under the hood, starts a notification listener, which WS-GRAM uses to send the notification messages to. If messages to the port of that listener are blocked for some reason, globusrun-ws doesn't get any job status information from the server. To verify that: submit a job in batch/non-interactive mode and store the EPR of the job. Then poll for status. Like globusrun-ws -submit -b -o job.epr ... globusrun-ws -status -f job.epr globusrun-ws -status -j job.epr is correct (-j, not -f) If you are able to get the job status this way, then the notification messages keep stuck somewhere between the server and the client. Martin Regards, Benjamin -- Benjamin Löhnhardt UNIVERSITÄTSMEDIZIN GÖTTINGEN GEORG-AUGUST-UNIVERSITÄT Abteilung Medizinische Informatik Robert-Koch-Straße 40 37075 Göttingen Briefpost 37099 Göttingen Telefon +49-551 / 39-22842 benjamin.loehnha...@med.uni-goettingen.de www.mi.med.uni-goettingen.de
Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate.
Can you paste the exact commands you use in the system calls, and the error you get in the concurrent scenario? Martin Belaid MOA wrote: That's right Martin. For each thread, I just call system(globus-credential-delegate ... ) and use the epr in system(globusrun-ws ). That's where I do not get any error. If, however, I call system(globusrun-ws ...) on each thread using a single epr (created in the shell script before running the C program), then I started getting RSL stagein element error. Thanks a lot Martin for looking at this. ~Belaid. Date: Wed, 21 Jul 2010 09:55:16 -0500 From: fel...@mcs.anl.gov CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. Hi, I'm not sure I get this question right, and I'm also not a C guy anymore. Does it work if you run globus-credential-delegate and globusrun-ws sequentially as command-line tools? I.e. 1. Call globus-credential-delegate and store the EPR somewhere. 2. Then use it for several globusrun-ws job submissions. Martin Belaid MOA wrote: Hi everyone, Just a quick question, I am using pthreads in C to run globusrun-ws and globus-credential-delegate concurrently on a GT4 PBS cluster. I noticed that using a single system call to globus-credential-delegate when submitting a set of jobs produces RSL stagein element error (The jobs are using the same epr produced by the single call to globus-credential-delegate). This does not happen when globus-credential-delegate is called for every job (each job has its own unique epr). Is that mean that globusrun-ws/globus-credential-delegate are not thread-safe? Thanks a lot in advance. ~Belaid. Turn down-time into play-time with Messenger games Play Now! http://go.microsoft.com/?linkid=9734381 Look 'em in the eye: FREE Messenger video chat Chat Now! http://go.microsoft.com/?linkid=9734382
Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate.
I'm sorry, I might be a bit dense but it's still not entirely clear to me: if you run the following in a plain shell script: globus-credential-delegate -h scheduler eprFileName globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName -o JobIdFile -f jobDescFile globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName -o JobIdFile -f jobDescFile globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName -o JobIdFile -f jobDescFile Do the jobs succeed or fail? Martin Belaid MOA wrote: Thanks a lot Martin for looking at this. 1- In the shell script, I run: globus-credential-delegate -h $scheduler $eprName 2- The command I call in each thread is: string sysCommand = globusrun-ws -submit -batch -F +scheduler+ -Ft +factory+ -S -Jf ; sysCommand.append(eprName); sysCommand.append( -o JobIdFile); sysCommand.append( -f ); sysCommand.append(jobDescFile); //submit the request system(command.c_str()); 3- The error is: $ globusrun-ws -status -j JobId: 94110cc2-9376-11df-9044-0019d1a Current job state: Failed globusrun-ws: Job failed: Staging error for RSL element fileStageIn. Connection creation error [Caused by: java.io.EOFException] Connection creation error [Caused by: java.io.EOFException] I do not have access to the GT4 log container on the PBS head node :(. ~Belaid. Date: Wed, 21 Jul 2010 10:40:26 -0500 From: fel...@mcs.anl.gov CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. Can you paste the exact commands you use in the system calls, and the error you get in the concurrent scenario? Martin Belaid MOA wrote: That's right Martin. For each thread, I just call system(globus-credential-delegate ... ) and use the epr in system(globusrun-ws ). That's where I do not get any error. If, however, I call system(globusrun-ws ...) on each thread using a single epr (created in the shell script before running the C program), then I started getting RSL stagein element error. Thanks a lot Martin for looking at this. ~Belaid. Date: Wed, 21 Jul 2010 09:55:16 -0500 From: fel...@mcs.anl.gov CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. Hi, I'm not sure I get this question right, and I'm also not a C guy anymore. Does it work if you run globus-credential-delegate and globusrun-ws sequentially as command-line tools? I.e. 1. Call globus-credential-delegate and store the EPR somewhere. 2. Then use it for several globusrun-ws job submissions. Martin Belaid MOA wrote: Hi everyone, Just a quick question, I am using pthreads in C to run globusrun-ws and globus-credential-delegate concurrently on a GT4 PBS cluster. I noticed that using a single system call to globus-credential-delegate when submitting a set of jobs produces RSL stagein element error (The jobs are using the same epr produced by the single call to globus-credential-delegate). This does not happen when globus-credential-delegate is called for every job (each job has its own unique epr). Is that mean that globusrun-ws/globus-credential-delegate are not thread-safe? Thanks a lot in advance. ~Belaid. Turn down-time into play-time with Messenger games Play Now! http://go.microsoft.com/?linkid=9734381 Look 'em in the eye: FREE Messenger video chat Chat Now! http://go.microsoft.com/?linkid=9734382 Look 'em in the eye: FREE Messenger video chat Chat Now! http://go.microsoft.com/?linkid=9734382
Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate.
You must not add at the end of the globus-credential-delegate command, because the job submission commands require the delegation command to have finished. Otherwise there won't be an epr of a delegated credential. Ok, I think what I get out of this is: It works sequentially (that is what I wanted to confirm, so the usage of the commands is ok), but maybe not using pthreads. I don't know what the problem might be. Maybe Joe Bester who wrote the command-line tools can provide more input on this. Martin Belaid MOA wrote: In the plain shell script as is, no error is thrown. But when we add at the end of each line, we get the error similar to the one we got from using pthreads. ~Belaid. Date: Wed, 21 Jul 2010 14:48:41 -0500 From: fel...@mcs.anl.gov To: belaid_...@hotmail.com CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. I'm sorry, I might be a bit dense but it's still not entirely clear to me: if you run the following in a plain shell script: globus-credential-delegate -h scheduler eprFileName globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName -o JobIdFile -f jobDescFile globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName -o JobIdFile -f jobDescFile globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName -o JobIdFile -f jobDescFile Do the jobs succeed or fail? Martin Belaid MOA wrote: Thanks a lot Martin for looking at this. 1- In the shell script, I run: globus-credential-delegate -h $scheduler $eprName 2- The command I call in each thread is: string sysCommand = globusrun-ws -submit -batch -F +scheduler+ -Ft +factory+ -S -Jf ; sysCommand.append(eprName); sysCommand.append( -o JobIdFile); sysCommand.append( -f ); sysCommand.append(jobDescFile); //submit the request system(command.c_str()); 3- The error is: $ globusrun-ws -status -j JobId: 94110cc2-9376-11df-9044-0019d1a Current job state: Failed globusrun-ws: Job failed: Staging error for RSL element fileStageIn. Connection creation error [Caused by: java.io.EOFException] Connection creation error [Caused by: java.io.EOFException] I do not have access to the GT4 log container on the PBS head node :(. ~Belaid. Date: Wed, 21 Jul 2010 10:40:26 -0500 From: fel...@mcs.anl.gov CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. Can you paste the exact commands you use in the system calls, and the error you get in the concurrent scenario? Martin Belaid MOA wrote: That's right Martin. For each thread, I just call system(globus-credential-delegate ... ) and use the epr in system(globusrun-ws ). That's where I do not get any error. If, however, I call system(globusrun-ws ...) on each thread using a single epr (created in the shell script before running the C program), then I started getting RSL stagein element error. Thanks a lot Martin for looking at this. ~Belaid. Date: Wed, 21 Jul 2010 09:55:16 -0500 From: fel...@mcs.anl.gov CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. Hi, I'm not sure I get this question right, and I'm also not a C guy anymore. Does it work if you run globus-credential-delegate and globusrun-ws sequentially as command-line tools? I.e. 1. Call globus-credential-delegate and store the EPR somewhere. 2. Then use it for several globusrun-ws job submissions. Martin Belaid MOA wrote: Hi everyone, Just a quick question, I am using pthreads in C to run globusrun-ws and globus-credential-delegate concurrently on a GT4 PBS cluster. I noticed that using a single system call to globus-credential-delegate when submitting a set of jobs produces RSL stagein element error (The jobs are using the same epr produced by the single call to globus-credential-delegate). This does not happen when globus-credential-delegate is called for every job (each job has its own unique epr). Is that mean that globusrun-ws/globus-credential-delegate are not thread-safe? Thanks a lot in advance. ~Belaid. Turn down-time into play-time with Messenger games Play Now! http://go.microsoft.com/?linkid=9734381 Look 'em in the eye: FREE Messenger video chat Chat Now! http://go.microsoft.com/?linkid=9734382 Look 'em in the eye: FREE Messenger video chat Chat Now! http://go.microsoft.com/?linkid=9734382 Turn down-time into play-time with
Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate.
I'll have to test the situation when the jobs are submitted in the background. Don't have a GT available at the moment, so it might take a while. What is the GT server and client version? Also: You said that the error when running the commands in a plain script is 'similar'. Can you paste it? Martin Belaid MOA wrote: Thanks a lot Lukasz. I completely agree. I was talking, however, at the service level not at the client side. Since using fork (with ) and pthread generates the same error, there is some how a problem when a single credential epr is shared between jobs simultaneously. ~Belaid. Date: Wed, 21 Jul 2010 15:44:52 +0200 From: luk...@ci.uchicago.edu To: belaid_...@hotmail.com CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. Hi Belaid, Thread safety has nothing to do with this. You creates completely separated processes. Threads from one process do not interact with threads from another process. Every process uses a separated memory space allocated by kernel for a process. (Processes can use shared variables if a method of inter-process communication by shared memory is implemented what, I am sure, is not a case here). Regards, Lukasz Belaid MOA wrote: Indeed, the delegation is done first (without ) and then the set of globusrun-ws are with . The following sentence from http://www.globus.org/toolkit/releasenotes/4.0.4/ may explain why: The service engine and clients are not thread-safe Is this means that any client call is not thread-safe? ~Belaid. Date: Wed, 21 Jul 2010 15:16:59 -0500 From: fel...@mcs.anl.gov CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. You must not add at the end of the globus-credential-delegate command, because the job submission commands require the delegation command to have finished. Otherwise there won't be an epr of a delegated credential. Ok, I think what I get out of this is: It works sequentially (that is what I wanted to confirm, so the usage of the commands is ok), but maybe not using pthreads. I don't know what the problem might be. Maybe Joe Bester who wrote the command-line tools can provide more input on this. Martin Belaid MOA wrote: In the plain shell script as is, no error is thrown. But when we add at the end of each line, we get the error similar to the one we got from using pthreads. ~Belaid. Date: Wed, 21 Jul 2010 14:48:41 -0500 From: fel...@mcs.anl.gov To: belaid_...@hotmail.com CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. I'm sorry, I might be a bit dense but it's still not entirely clear to me: if you run the following in a plain shell script: globus-credential-delegate -h scheduler eprFileName globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName -o JobIdFile -f jobDescFile globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName -o JobIdFile -f jobDescFile globusrun-ws -submit -batch -F scheduler -Ft factory -S -Jf eprFileName -o JobIdFile -f jobDescFile Do the jobs succeed or fail? Martin Belaid MOA wrote: Thanks a lot Martin for looking at this. 1- In the shell script, I run: globus-credential-delegate -h $scheduler $eprName 2- The command I call in each thread is: string sysCommand = globusrun-ws -submit -batch -F +scheduler+ -Ft +factory+ -S -Jf ; sysCommand.append(eprName); sysCommand.append( -o JobIdFile); sysCommand.append( -f ); sysCommand.append(jobDescFile); //submit the request system(command.c_str()); 3- The error is: $ globusrun-ws -status -j JobId: 94110cc2-9376-11df-9044-0019d1a Current job state: Failed globusrun-ws: Job failed: Staging error for RSL element fileStageIn. Connection creation error [Caused by: java.io.EOFException] Connection creation error [Caused by: java.io.EOFException] I do not have access to the GT4 log container on the PBS head node :(. ~Belaid. Date: Wed, 21 Jul 2010 10:40:26 -0500 From: fel...@mcs.anl.gov CC: gt-user@lists.globus.org Subject: Re: [gt-user] Threads and globusrun-ws/globus-credential-delegate. Can you paste the exact commands you use in the system calls, and the error you get in the concurrent scenario? Martin Belaid MOA wrote: That's right Martin. For each thread, I just call system(globus-credential-delegate ... ) and use the epr in system(globusrun-ws ). That's where I do not get any error. If, however, I call system(globusrun-ws ...) on each thread using a single epr (created in the shell
Re: [gt-user] Installation problem in globus toolkit 5.0.1
globus 5.x doesn't have web services support. naveen wrote: I am installing globus toolkit 5.0.1 on ubuntu 9. It is installed on my system but i don't know how to install web service container on globus and how to start web services. Because there is no help or guidance provided in quick start file of globus toolkit 5.0.1 as it is provided in quick start of toolkit 4.0.*. I need WSRF for weka4ws. So please help me to install and use web services on globus toolkit 5.0.1.
Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile
Hi Marco, All very strange... If you were not in the grid-mapfile you wouldn't get that far in the job submission. I see this: 2010-06-23 10:32:32,817 DEBUG authorization.GridMapAuthorization [ServiceThread-73,isPermitted:181] Peer /O=KGrid/CN=Marco Lackovic authorized as lackovic based on gridmap file /etc/grid-security/grid-mapfile 2010-06-23 10:32:32,831 DEBUG factory.ManagedJobFactoryService [ServiceThread-73,createManagedJob:96] Entering createManagedJob() so the authorization check prior to the service call indicates you are mapped in /etc/grid-security/grid-mapfile. But later on, in the submission phase of the job: 2010-06-23 10:32:33,640 DEBUG exec.StateMachine [RunQueueThread_2,runScript:2898] running script submit 2010-06-23 10:32:33,640 DEBUG exec.JobManagerScript [RunQueueThread_2,run:199] Executing command: /usr/bin/sudo -H -u lackovic -S /usr/local/globus-4.0.8/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /usr/local/globus-4.0.8/libexec/globus-job-manager-script.pl -m fork -f /usr/local/globus-4.0.8/tmp/gram_job_mgr2078824211274262568.tmp -c submit 2010-06-23 10:32:33,668 DEBUG exec.JobManagerScript [RunQueueThread_2,run:218] first line: null 2010-06-23 10:32:33,670 DEBUG exec.JobManagerScript [RunQueueThread_2,run:328] failure message: Script stderr: lackovic is not in the grid mapfile (uses the same grid-mapfile) Can you please send me your entire grid-mapfile (maybe not to the list)? I want to check if I can replicate something like that. Martin Marco Lackovic wrote: Hi Martin, On Tue, Jun 22, 2010 at 8:11 PM, Martin Feller fel...@mcs.anl.gov wrote: I would really like to see the entire log of a job, ideally in a format that is a bit easier to digest. Yes, you are right, sorry for that. I wasn't sure I could send attachments to the mailing-list. You can find the log attached to this message.
Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile
Marco Lackovic wrote: On Wed, Jun 23, 2010 at 2:12 PM, Martin Feller fel...@mcs.anl.gov wrote: You seem to run the GT server as another user than lackovic, e.g. as user globus. If so, and if the grid-mapfile is readable for the user globus but not to user lackovic, then you would run into this situation: The first check for a mapping is done by the user who runs the server (globus in this example). globus has read privs and things are ok, because lackovic is mapped the grid-mapfile Later, the job is submitted as user lackovic (sudo) though, and if lackovic does not have permissions to read the grid-mapfile, then we get this error. That was it! Really well thought! Excellent! Thank you very much. I guess all users must have read permissions on the grid-mapfile. I solved by adding the user to the globus group. Shouldn't it be mentioned somewhere in the guide? Glad it's working! Yeah, it should be in the docs. I wonder why we never ran into this issue more often. Martin
Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile
Marco Lackovic wrote: Hi Martin, On Mon, Jun 21, 2010 at 2:30 AM, Martin Feller fel...@mcs.anl.gov wrote: How do $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml, $GLOBUS_LOCATION/etc/globus_wsrf_gram/managed-job-factory-security-config.xml, $GLOBUS_LOCATION/etc/globus_wsrf_gram/managed-job-security-config.xml look like? - global_security_descriptor.xml: ?xml version=1.0 encoding=UTF-8? securityConfig xmlns=http://www.globus.org; credential key-file value=/etc/grid-security/containerkey.pem/ cert-file value=/etc/grid-security/containercert.pem/ /credential gridmap value=/etc/grid-security/grid-mapfile/ /securityConfig The directory $GLOBUS_LOCATION/etc/globus_wsrf_gram does not exist. I guess that must have been the problem. I don't understand how could that happen: I have built GT from source and the build completed successfully. So the mapping of DNs to local accounts works now or does it still fail? Do you have the env var GRIDMAP set in the environment of the user who runs the GT4 server? No, it is not set.
Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile
Marco Lackovic wrote: On Tue, Jun 22, 2010 at 1:23 PM, Martin Feller fel...@mcs.anl.gov wrote: The directory $GLOBUS_LOCATION/etc/globus_wsrf_gram does not exist. I guess that must have been the problem. I don't understand how could that happen: I have built GT from source and the build completed successfully. So the mapping of DNs to local accounts works now or does it still fail? Still fails because the directory etc/globus_wsrf_gram is still missing. I don't know how to fix it. How do you install it (4.0.8, right)?
Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile
Sorry, my bad. In 4.0.x the directory names are different compared to 4.2.x: I meant the following directories: $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml, $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml, $GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml Um, but I see that you didn't run 'make install' after configure and make, which is a required step. What if you run 'make install' at the end and try it then? Martin Marco Lackovic wrote: On Tue, Jun 22, 2010 at 2:36 PM, Martin Feller fel...@mcs.anl.gov wrote: Still fails because the directory etc/globus_wsrf_gram is still missing. I don't know how to fix it. How do you install it (4.0.8, right)? It's right, 4.0.8. I downloaded the Full Toolkit Source Download from here: http://www.globus.org/toolkit/downloads/4.0.8/ then ran: ./configure --prefix=/usr/local/globus-4.0.8/ --with-iodbc=/usr/lib and then make | tee installer.log
Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile
Hmm, looks ok. The only reason I can see is that at some point another grid-mapfile is being used than /etc/grid-security/grid-mapfile. Please enable debug logging on the server-side in ws-gram and send the container logfile containing logs of a problematic job submission. Martin Marco Lackovic wrote: On Tue, Jun 22, 2010 at 2:50 PM, Martin Feller fel...@mcs.anl.gov wrote: $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml, ?xml version=1.0 encoding=UTF-8? securityConfig xmlns=http://www.globus.org; credential key-file value=/etc/grid-security/containerkey.pem/ cert-file value=/etc/grid-security/containercert.pem/ /credential gridmap value=/etc/grid-security/grid-mapfile/ /securityConfig $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml, securityConfig xmlns=http://www.globus.org; method name=createManagedJob auth-method GSITransport/ GSISecureMessage/ GSISecureConversation/ /auth-method /method authz value=gridmap/ reject-limited-proxy value=true/ /securityConfig $GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml securityConfig xmlns=http://www.globus.org; auth-method GSITransport/ GSISecureMessage/ GSISecureConversation/ /auth-method authz value=gridmap/ run-as resource-identity/ /run-as /securityConfig Um, but I see that you didn't run 'make install' after configure and make, which is a required step. What if you run 'make install' at the end and try it then? Sorry, I forgot to mention that but I actually already did that too.
Re: [gt-user] exec.StateMachine Error code: 201 user is not in the grid mapfile
How do $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml, $GLOBUS_LOCATION/etc/globus_wsrf_gram/managed-job-factory-security-config.xml, $GLOBUS_LOCATION/etc/globus_wsrf_gram/managed-job-security-config.xml look like? Do you have the env var GRIDMAP set in the environment of the user who runs the GT4 server? Martin Marco Lackovic wrote: Hello, I am still having the problem mentioned below. Basically I get a user is not in the grid mapfile error message, while the user actually is in the grid-mapfile. Using GT 4.0.8 on Ubuntu 10.04. Any clue would be highly appreciated. On Mon, May 3, 2010 at 6:36 PM, Marco Lackovic lacko...@si.deis.unical.it wrote: when I run the following command: globusrun-ws -submit -c /bin/touch touched_it I get the following error: Submitting job...Done. Job ID: uuid:aa88b8ce-56d6-11df-ac93-00248ce78cc1 Termination time: 05/04/2010 17:09 GMT Current job state: Failed Destroying job...Done. globusrun-ws: Job failed: Error code: 201 Script stderr: john is not in the grid mapfile and on the terminal, on the same machine, where the globus container of GT 4.0.8 is running: 2010-05-03 19:22:00,820 INFO exec.StateMachine [RunQueueThread_0,logJobAccepted:3424] Job 6237c130-56d8-11df-9b26-c844d3674bc1 accepted for local user 'john' for DN '/O=XGrid/OU=YGrid/CN=John Doe' 2010-05-03 19:22:00,911 WARN exec.StateMachine [RunQueueThread_2,createFaultFromErrorCode:3181] Unhandled fault code 201 2010-05-03 19:22:01,281 INFO exec.StateMachine [RunQueueThread_7,logJobFailed:3455] Job 6237c130-56d8-11df-9b26-c844d3674bc1 failed. Description: Error code: 201 Cause: org.globus.exec.generated.FaultType: Error code: 201 caused by [0: org.oasis.wsrf.faults.BaseFaultType: Script stderr: john is not in the grid mapfile] while actually john *is* in the grid-mapfile: /O=XGrid/OU=YGrid/CN=John Doe john Furthermore, when I run grid-mapfile-check-consistency I get the following output: Checking /etc/grid-security/grid-mapfile grid mapfile Verifying grid mapfile existence...OK Checking for duplicate entries...OK Checking for valid user names...OK I installed GT 4.0.x many times and this is the first time something like this happens. Does anybody know what could be the problem?
Re: [gt-user] Error while running container and job submission
Ankuj Gupta wrote: The RSL file has following contents job executablemy_echo/executable directory${GLOBUS_USER_HOME}/directory argumentHello/argument argumentWorld!/argument stdout${GLOBUS_USER_HOME}/stdout/stdout stderr${GLOBUS_USER_HOME}/stderr/stderr fileStageIn transfer sourceUrlgsiftp://ashish.gridglobus.com:2811/bin/echo http://ashish.gridglobus.com:2811/bin/echo/sourceUrl destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl /transfer /fileStageIn fileCleanUp deletion filefile:///${GLOBUS_USER_HOME}/my_echo/file /deletion /fileCleanUp /job Here ashish.gridglobus.com http://ashish.gridglobus.com is the user from which I am submitting the job. User? Don't you mean host/machine? And the machine where GT4 is running and where you submit the job to is ankuj.gridglobus.com with the ip address 192.168.1.40? Ankuj On Fri, Jun 4, 2010 at 9:17 AM, Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov wrote: What does your job description look like? J Ankuj Gupta wrote: Hi!! I am getting the following error while running the container 2010-06-03 18:47:38,154 ERROR container.GSIServiceThread [ServiceThread-47,process:147] Error processing request java.io.EOFException at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:56) at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:60) at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:122) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:142) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:161) at org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:99) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:291) 2010-06-03 18:47:38,347 INFO impl.DefaultIndexService [ServiceThread-45,processConfigFile:107] Reading default registration configuration from file: /usr/local/globus-4.0.7/etc/globus_wsrf_mds_index/hierarchy.xml Starting SOAP server at: https://192.168.1.40:8443/wsrf/services/ With the following services: If I try to submit a job from a user node using an RSL file I get the following error on the client globusrun-ws: Job failed: Staging error for RSL element fileStageIn. ; nested exception is: javax.xml.rpc.soap.SOAPFaultException: Host authorization failed: expected /CN=host/192.168.1.40 http://192.168.1.40 http://192.168.1.40, peer returned /O=Grid/OU=GlobusTest/OU=simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com Ankuj Gupta
Re: [gt-user] Error while running container and job submission
Looks like the GridFTP server on ashish.gridglobus.com expects the submitting machine (GT machine) to authorize with a credential id containing the ip (192.168.1.40) instead of ankuj.gridglobus.com. But I'm still a bit unsure. Please run the job again adding the debug option on the client-side: globusrun-ws -submit -dbg and sent the output to this list. Martin Ankuj Gupta wrote: I am trying to submit a job from one machine akhil.gridglobus,com with IP 192.168.1.50 to ankuj.gridglobus.com http://ankuj.gridglobus.com with IP 192.168.1.40 and that is where the container is running Ankuj On Fri, Jun 4, 2010 at 7:27 PM, Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov wrote: Ankuj Gupta wrote: The RSL file has following contents job executablemy_echo/executable directory${GLOBUS_USER_HOME}/directory argumentHello/argument argumentWorld!/argument stdout${GLOBUS_USER_HOME}/stdout/stdout stderr${GLOBUS_USER_HOME}/stderr/stderr fileStageIn transfer sourceUrlgsiftp://ashish.gridglobus.com:2811/bin/echo http://ashish.gridglobus.com:2811/bin/echo http://ashish.gridglobus.com:2811/bin/echo/sourceUrl destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl /transfer /fileStageIn fileCleanUp deletion filefile:///${GLOBUS_USER_HOME}/my_echo/file /deletion /fileCleanUp /job Here ashish.gridglobus.com http://ashish.gridglobus.com http://ashish.gridglobus.com is the user from which I am submitting the job. User? Don't you mean host/machine? And the machine where GT4 is running and where you submit the job to is ankuj.gridglobus.com http://ankuj.gridglobus.com with the ip address 192.168.1.40? Ankuj On Fri, Jun 4, 2010 at 9:17 AM, Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov wrote: What does your job description look like? J Ankuj Gupta wrote: Hi!! I am getting the following error while running the container 2010-06-03 18:47:38,154 ERROR container.GSIServiceThread [ServiceThread-47,process:147] Error processing request java.io.EOFException at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:56) at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:60) at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:122) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:142) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:161) at org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:99) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:291) 2010-06-03 18:47:38,347 INFO impl.DefaultIndexService [ServiceThread-45,processConfigFile:107] Reading default registration configuration from file: /usr/local/globus-4.0.7/etc/globus_wsrf_mds_index/hierarchy.xml Starting SOAP server at: https://192.168.1.40:8443/wsrf/services/ With the following services: If I try to submit a job from a user node using an RSL file I get the following error on the client globusrun-ws: Job failed: Staging error for RSL element fileStageIn. ; nested exception is: javax.xml.rpc.soap.SOAPFaultException: Host authorization failed: expected /CN=host/192.168.1.40 http://192.168.1.40 http://192.168.1.40 http://192.168.1.40, peer returned /O=Grid/OU=GlobusTest/OU=simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com Ankuj Gupta
Re: [gt-user] Error while running container and job submission
What does your job description look like? J Ankuj Gupta wrote: Hi!! I am getting the following error while running the container 2010-06-03 18:47:38,154 ERROR container.GSIServiceThread [ServiceThread-47,process:147] Error processing request java.io.EOFException at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:56) at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:60) at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:122) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:142) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:161) at org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:99) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:291) 2010-06-03 18:47:38,347 INFO impl.DefaultIndexService [ServiceThread-45,processConfigFile:107] Reading default registration configuration from file: /usr/local/globus-4.0.7/etc/globus_wsrf_mds_index/hierarchy.xml Starting SOAP server at: https://192.168.1.40:8443/wsrf/services/ With the following services: If I try to submit a job from a user node using an RSL file I get the following error on the client globusrun-ws: Job failed: Staging error for RSL element fileStageIn. ; nested exception is: javax.xml.rpc.soap.SOAPFaultException: Host authorization failed: expected /CN=host/192.168.1.40 http://192.168.1.40, peer returned /O=Grid/OU=GlobusTest/OU=simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com http://simpleCA-ankuj.gridglobus.com/CN=host/ankuj.gridglobus.com Ankuj Gupta
Re: [gt-user] [Slightly OT] Handling repeated input files
Dougal, To the best of my knowledge Gram4.x/RFT does not have such a detection mechanism. I don't know if tools on top of Gram4 (Swift, Gridway, others?) provide mechanisms for your use-case. A general note: If you are not tied to web-services, I'd check with the Gram5 folks if they still support what Steve described, and maybe consider going with Gram5 (based on and improved Gram2) in the long term, because in medium terms Gram4.x won't be supported anymore, but Gram5 will (back to the future! :) ) Martin Steven Timm wrote: Dougal--I am not sure what the GT4 equivalent is for file stage-in but I know that the GT2 stagein does detect that the same file has previously been staged in, and not stage it in again. The cache on the far end has many hard links to the same file. Steve On Mon, 31 May 2010, Dougal Ballantyne wrote: Dear GT, I have been working on a project for several months now researching and developing a grid solution based on Globus Toolkit 4. Many thanks to people who have helped me with previous issues. I have a slightly Off-Topic question related to how others handle a particular scenario. We have a job generation and control application that we have added support for Globus through some perl modules that call globusrun-ws. When a job is generated, the program pulls from the job database the associated input files and creates an XML file which lists the input files in StageIn and the requested results file in StageOut. This works great for a single job and jobs that all use different input data. However we often have a scenario when we generate several hundred jobs that all use the same input data. In our current setup we would StageIn the same input file several hundred times. I was wondering if that was a method or known best practice within the Globus Toolkit for handling this sort of scenario. I am aware that we could modify the tool to stage the data first, run the jobs and then remove the input file BUT that would also be a change of workflow for the users. Your thoughts or comments greatly appreciated. Kind regards, Dougal Ballantyne
Re: [gt-user] Error while staging a job
The error says Unable to connect to localhost:8443. You are submitting the job to localhost and it seems that there's no GT server running on localhost (or it doesn't listen on port 8443). Either submit the job to a machine where a server is running (use the -F option [check globusrun-ws -help about the usage of the -F option]) or start a GT server at localhost. Martin Ankuj Gupta wrote: Hi!! I am submitting a follwoing rsl file job executablemy_echo/executable directory${GLOBUS_USER_HOME}/directory argumentHello/argument argumentWorld!/argument stdout${GLOBUS_USER_HOME}/stdout/stdout stderr${GLOBUS_USER_HOME}/stderr/stderr fileStageIn transfer sourceUrlgsiftp://akhil.gridglobus.com:2811/bin/echo http://akhil.gridglobus.com:2811/bin/echo/sourceUrl destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl /transfer /fileStageIn fileCleanUp deletion filefile:///${GLOBUS_USER_HOME}/my_echo/file /deletion /fileCleanUp /job But I am getting the following error [t...@akhil ~]$ globusrun-ws -submit -S -f a.rsl Delegating user credentials...Failed. globusrun-ws: Error trying to delegate globus_xio: Unable to connect to localhost:8443 globus_xio: System error in connect: Connection refused globus_xio: A system call failed: Connection refused Ankuj Gupta
Re: [gt-user] GT4.2.1 deployment into Tomcat 5.5 - errors with persisted store
Dougal Ballantyne wrote: Martin, I have got it working. I ended up having to add -Dorg.globus.wsrf.container.persistence.dir=/where/ever/you/want/it/to/be directly into the RHEL tomcat daemon start/stop script. It did not seem to be picking up GLOBUS_OPTIONS, no matter where I exported them. Might be a RHEL thing or I need to dig a bit further but at least I am able to change it. Might also be a Globus thing. :) Glad that it's working, and thanks for the feedback. Martin Thank you. -Dougal On Mon, May 24, 2010 at 4:28 PM, Dougal Ballantyne dougal.li...@gmail.com wrote: Martin, I am kicking myself for not reading more... Sorry. Starting testing, didn't go in first time so tweaking with the rather adapted RHEL startup scripts for tomcat to get the environment variable exported. Thank you for the steer. -Dougal On Mon, May 24, 2010 at 4:12 PM, Martin Feller fel...@mcs.anl.gov wrote: Martin Feller wrote: Hi, I didn't try it, just an educated guess: Any chance you have the property -Dorg.globus.wsrf.container.persistence.dir set to /usr/share/tomcat5/.globus, e.g. via the environment variable GLOBUS_OPTIONS? (http://www.globus.org/toolkit/docs/4.0/common/javawscore/Java_WS_Core_Public_Interfaces.html#s-javawscore-Public_Interfaces-env) Oh, this was a 4.0 link, but it's the same in 4.2: http://www.globus.org/toolkit/docs/4.2/4.2.1/admin/install/ If not: Does it work if you explicitly set it, like export GLOBUS_OPTIONS=$GLOBUS_OPTIONS -Dorg.globus.wsrf.container.persistence.dir=/where/ever/you/want/it/to/be and restart tomcat? Martin Dougal Ballantyne wrote: Hi, I have been working on a GT4.2.1 deployment and for larger scale testing, I have been preparing for a deployment into the Tomcat 5.5 server. I am working on a RHEL 5.5 system and would like to use the provided tomcat5-* rpms. I have successfully deployed the application into the webapps folder and adjusted the locations of the BDB databases and temporary storage locations and it all works as expected. However there is one item I just cannot seem to get relocated, the persisted directory created under the user starting the container in ~/.globus/persisted. I am getting the following errors in catalina.out: Using CATALINA_BASE: /usr/share/tomcat5 Using CATALINA_HOME: /usr/share/tomcat5 Using CATALINA_TMPDIR: /usr/share/tomcat5/temp Using JRE_HOME: May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log INFO: ContextListener: contextInitialized() May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log INFO: SessionListener: contextInitialized() May 23, 2010 3:31:19 PM org.apache.commons.vfs.VfsLog info INFO: Using /usr/share/tomcat5/temp/vfs_cache as temporary files store. May 23, 2010 3:31:20 PM org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask run WARNING: Recovery exception org.globus.wsrf.ResourceException: Unabled to locate persisted resource properties directory. ; nested exception is: java.io.IOException: [JWSCORE-205] Failed to create storage directory: '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType' at org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:176) at org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask.run(ManagedJobFactoryResource.java:388) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: [JWSCORE-205] Failed to create storage directory: '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType' at org.globus.wsrf.utils.FilePersistenceHelper.createStorageDirectory(FilePersistenceHelper.java:123) at org.globus.wsrf.utils.FilePersistenceHelper.setStorageDirectory(FilePersistenceHelper.java:191) at org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:181) at org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:141) at org.globus.wsrf.utils.XmlPersistenceHelper.init(XmlPersistenceHelper.java:74) at org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:171) ... 9 more
Re: [gt-user] GT4.2.1 deployment into Tomcat 5.5 - errors with persisted store
Hi, I didn't try it, just an educated guess: Any chance you have the property -Dorg.globus.wsrf.container.persistence.dir set to /usr/share/tomcat5/.globus, e.g. via the environment variable GLOBUS_OPTIONS? (http://www.globus.org/toolkit/docs/4.0/common/javawscore/Java_WS_Core_Public_Interfaces.html#s-javawscore-Public_Interfaces-env) If not: Does it work if you explicitly set it, like export GLOBUS_OPTIONS=$GLOBUS_OPTIONS -Dorg.globus.wsrf.container.persistence.dir=/where/ever/you/want/it/to/be and restart tomcat? Martin Dougal Ballantyne wrote: Hi, I have been working on a GT4.2.1 deployment and for larger scale testing, I have been preparing for a deployment into the Tomcat 5.5 server. I am working on a RHEL 5.5 system and would like to use the provided tomcat5-* rpms. I have successfully deployed the application into the webapps folder and adjusted the locations of the BDB databases and temporary storage locations and it all works as expected. However there is one item I just cannot seem to get relocated, the persisted directory created under the user starting the container in ~/.globus/persisted. I am getting the following errors in catalina.out: Using CATALINA_BASE: /usr/share/tomcat5 Using CATALINA_HOME: /usr/share/tomcat5 Using CATALINA_TMPDIR: /usr/share/tomcat5/temp Using JRE_HOME: May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log INFO: ContextListener: contextInitialized() May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log INFO: SessionListener: contextInitialized() May 23, 2010 3:31:19 PM org.apache.commons.vfs.VfsLog info INFO: Using /usr/share/tomcat5/temp/vfs_cache as temporary files store. May 23, 2010 3:31:20 PM org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask run WARNING: Recovery exception org.globus.wsrf.ResourceException: Unabled to locate persisted resource properties directory. ; nested exception is: java.io.IOException: [JWSCORE-205] Failed to create storage directory: '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType' at org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:176) at org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask.run(ManagedJobFactoryResource.java:388) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: [JWSCORE-205] Failed to create storage directory: '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType' at org.globus.wsrf.utils.FilePersistenceHelper.createStorageDirectory(FilePersistenceHelper.java:123) at org.globus.wsrf.utils.FilePersistenceHelper.setStorageDirectory(FilePersistenceHelper.java:191) at org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:181) at org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:141) at org.globus.wsrf.utils.XmlPersistenceHelper.init(XmlPersistenceHelper.java:74) at org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:171) ... 9 more I have searched through the source and the deployed application but I can find no reference to where it might be getting this path from. [r...@globus-sge globus-4.2.1]# pwd /opt/globus-4.2.1 [r...@globus-sge globus-4.2.1]# grep -r '/.globus/persisted/' . grep: warning: ./etc/gpt/packages/packages: recursive directory loop grep: warning: ./etc/globus_packages/packages: recursive directory loop ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/fork/globus-scheduler-provider-fork.in:my @persistence_files = glob(~/.globus/persisted/$host-$port/ManagedExecutableJobResourceStateType/*.xml); ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/pbs/globus-scheduler-provider-pbs.in:my @persistence_files = glob(~/.globus/persisted/$host-$port/ManagedExecutableJobResourceStateType/*.xml); ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/condor/globus-scheduler-provider-condor.in:my @persistence_files =
Re: [gt-user] GT4.2.1 deployment into Tomcat 5.5 - errors with persisted store
Martin Feller wrote: Hi, I didn't try it, just an educated guess: Any chance you have the property -Dorg.globus.wsrf.container.persistence.dir set to /usr/share/tomcat5/.globus, e.g. via the environment variable GLOBUS_OPTIONS? (http://www.globus.org/toolkit/docs/4.0/common/javawscore/Java_WS_Core_Public_Interfaces.html#s-javawscore-Public_Interfaces-env) Oh, this was a 4.0 link, but it's the same in 4.2: http://www.globus.org/toolkit/docs/4.2/4.2.1/admin/install/ If not: Does it work if you explicitly set it, like export GLOBUS_OPTIONS=$GLOBUS_OPTIONS -Dorg.globus.wsrf.container.persistence.dir=/where/ever/you/want/it/to/be and restart tomcat? Martin Dougal Ballantyne wrote: Hi, I have been working on a GT4.2.1 deployment and for larger scale testing, I have been preparing for a deployment into the Tomcat 5.5 server. I am working on a RHEL 5.5 system and would like to use the provided tomcat5-* rpms. I have successfully deployed the application into the webapps folder and adjusted the locations of the BDB databases and temporary storage locations and it all works as expected. However there is one item I just cannot seem to get relocated, the persisted directory created under the user starting the container in ~/.globus/persisted. I am getting the following errors in catalina.out: Using CATALINA_BASE: /usr/share/tomcat5 Using CATALINA_HOME: /usr/share/tomcat5 Using CATALINA_TMPDIR: /usr/share/tomcat5/temp Using JRE_HOME: May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log INFO: ContextListener: contextInitialized() May 23, 2010 3:31:18 PM org.apache.catalina.core.ApplicationContext log INFO: SessionListener: contextInitialized() May 23, 2010 3:31:19 PM org.apache.commons.vfs.VfsLog info INFO: Using /usr/share/tomcat5/temp/vfs_cache as temporary files store. May 23, 2010 3:31:20 PM org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask run WARNING: Recovery exception org.globus.wsrf.ResourceException: Unabled to locate persisted resource properties directory. ; nested exception is: java.io.IOException: [JWSCORE-205] Failed to create storage directory: '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType' at org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:176) at org.globus.exec.service.factory.ManagedJobFactoryResource$RecoveryTask.run(ManagedJobFactoryResource.java:388) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: [JWSCORE-205] Failed to create storage directory: '/usr/share/tomcat5/.globus/persisted/127.0.0.1-wsrf/ManagedExecutableJobResourceStateType' at org.globus.wsrf.utils.FilePersistenceHelper.createStorageDirectory(FilePersistenceHelper.java:123) at org.globus.wsrf.utils.FilePersistenceHelper.setStorageDirectory(FilePersistenceHelper.java:191) at org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:181) at org.globus.wsrf.utils.FilePersistenceHelper.init(FilePersistenceHelper.java:141) at org.globus.wsrf.utils.XmlPersistenceHelper.init(XmlPersistenceHelper.java:74) at org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:171) ... 9 more I have searched through the source and the deployed application but I can find no reference to where it might be getting this path from. [r...@globus-sge globus-4.2.1]# pwd /opt/globus-4.2.1 [r...@globus-sge globus-4.2.1]# grep -r '/.globus/persisted/' . grep: warning: ./etc/gpt/packages/packages: recursive directory loop grep: warning: ./etc/globus_packages/packages: recursive directory loop ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/fork/globus-scheduler-provider-fork.in:my @persistence_files = glob(~/.globus/persisted/$host-$port/ManagedExecutableJobResourceStateType/*.xml); ./SRC/gt4.2.1-all-source-installer/source-trees-thr/ws-gram/discovery/providers/setup/pbs/globus-scheduler-provider-pbs.in:my @persistence_files = glob(~/.globus/persisted/$host-$port/ManagedExecutableJobResourceStateType/*.xml); ./SRC/gt4.2.1-all-source-installer/source-trees-thr
Re: [gt-user] globusrun-ws failed with Job-Failed: Invalid stdout element.
Hi, Very probably something is wrong with your file system mapping file. Some information about the file system mapping: http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/gram4/admin/#gram4-Interface_Config_Frag-filesysmap Does $GLOBUS_LOCATION/etc/globus_wsrf_gram/globus_gram_fs_map_config.xml exist? Did you modify it and it's maybe broken? Martin Jörg Lenhardt wrote: Hello! I build Globus TK 4.2.1 on Solaris 10 (SPARC) and till now everything worked fine. But if I try to submit a WS GRAM job using a job definition file, the execution fails with the error message: Invalid stdout element. File map initialization failed. Job definition: ?xml version=1.0 encoding=UTF-8? job executable/bin/echo/executable argumentOutput/argument stdout${GLOBUS_USER_HOME}/stdout/stdout stderr${GLOBUS_USER_HOME}/stderr/stderr /job Job execution: aus...@zone2:~ $ globusrun-ws -submit -f echo_job.xml Submitting job...Done. Job ID: uuid:4a9df2dc-6444-11df-ac26-01006cc8 Termination time: 05/20/3010 19:17 GMT Current job state: Failed Destroying job...Done. globusrun-ws: Job failed: Invalid stdout element. File map initialization failed. Globus container output: 2010-05-20T21:17:12.237+02:00 INFO PersistentManagedExecutableJobResource.4aca2640-6444-11df-80ff-8db553e71ea8 [ServiceThread-58,start:761] Job 4aca2640-6444-11df-80ff-8db553e71ea8 with client submission-id 4a9df2dc-6444-11df-ac26-01006cc8 accepted for local user 'auser1' 2010-05-20T21:17:13.752+02:00 INFO handler.SubmitStateHandler [pool-1-thread-5,process:172] Job 4aca2640-6444-11df-80ff-8db553e71ea8 submitted with local job ID '4bd68876-6444-11df-bc23-01007edd:6912' 2010-05-20T21:17:17.191+02:00 INFO handler.FinalizeTerminationStateHandler [pool-1-thread-3,handleFailedState:100] Job 4aca2640-6444-11df-80ff-8db553e71ea8 failed. Fault #1: Description: Invalid stdout element. File map initialization failed. Cause: org.globus.exec.generated.ServiceLevelAgreementFaultType: Invalid stdout element. File map initialization failed. caused by [0: org.oasis.wsrf.faults.BaseFaultType: File map initialization failed. ] The files stdout and stderr ARE created and stdout contains Output. The following works fine without an error. aus...@zone2:~ $ globusrun-ws -submit -c /bin/touch /tmp/file I really do not know what's wrong. Some information about the environment: - Using a Solaris Zone for Globus - perl is installed in /usr/local/ with XML::Parser - PATH is set to search /usr/local/bin before any other path (global in /etc/profile) - sudo is configured globus ALL=(auser1) NOPASSWD: /usr/local/globus-4.2.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /usr/local/globus-4.2.1/libexec/globus-job-manager-script.pl * globus ALL=(auser1) NOPASSWD: /usr/local/globus-4.2.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /usr/local/globus-4.2.1/libexec/globus-gram-local-proxy-tool * Hope anyone could guide me out of the darkness ... ;) Joerg Lenhardt
Re: [gt-user] Error in writing rsl file for globus 4.0.7
There might be 2 reasons: 1. You didn't specify a job manager. 2. Try adding namespaces for the factoryEndpoint element. Does the following work? job factoryEndpoint xmlns:gram=http://www.globus.org/namespaces/2004/10/gram/job; xmlns:wsa=http://schemas.xmlsoap.org/ws/2004/03/addressing; wsa:Address https://192.168.4.88:8443/wsrf/services/ManagedJobFactoryService /wsa:Address wsa:ReferenceProperties gram:ResourceIDFork/gram:ResourceID /wsa:ReferenceProperties /factoryEndpoint executable/bin/date/executable /job Martin praveenesh kumar wrote: Hello everyone ..!! I am using globus 4.0.7 on 4 machines.. my grid is configured properly and I am able to submit jobs to other grid nodes using globusrun-ws command. Now , I am trying to write jobs in rsl format and inside that rsl format I want to use the ManagedJobFactory service of other grid nots. but I am not able to submit jobs using ManagedJobFactroy sercvice of other nodes.. The point is I do not want to use globusrun-ws -F (othergridnode's ip address) -submit -c my rsl file I want to specify the other grid node's ip address in the rsl file.. Can anyone suggest me some simple example on how to do this.. I am writing the following code for my rsl flie.. it is giving me some parsing error job factoryEndpoint Address https://192.168.4.88:8443/wsrf/services/ManagedJobFactoryService /Address /factoryEndpoint executable/bin/date/executable /job Can someone correct the above code. I need this thing urgently.. Thanxxx..!!!
Re: [gt-user] WS_GRAM Stage-out problem
Marco Lackovic wrote: On Mon, May 10, 2010 at 1:24 AM, Martin Feller fel...@mcs.anl.gov wrote: Ok, my first guess is, that the mapping in the grid-mapfile is not as you think it is, i.e. the DN of the user who submits the transfer request is mapped to another user and not to the user 'globus'. You mentioned that it is, but maybe worth double-checking. The procedure chosen for the system I am using is to have local accounts, on every grid node, for all grid users and then have them mapped to their own local accounts in the grid-mapfile. Users were then added to the globus group so that they could access common files located in the /home/globus directory. I suspect this might not be the proper way to do things and I am in a position to change them. Do you advice against that procedure? Is it customary to map instead all grid users to the user 'globus'? Can you suggest a good reference on this topic? This approach looks fine to me. We use the same approach in a project I work on: All users have individual accounts, but are members of various local unix groups. Each group reflects a project. Depending on group membership they can access project data being owned by that group, or not. There is another approach where all users share a community credential which is augmented with user-specific information (attributes). Authorization decisions are then done by callouts which check the the user-specific attributes. I don't know enough about it to give you detailed information about this, but if you are interested i could maybe find documentation pointers or forward this to folks who know more about this. Or maybe there's even somebody on this list who can provide input on this! If you have control over the machine where the GT server is running and the system is not a production system: create a grid-mapfile with just one mapping of your DN (you can e.g. get your DN by running the command 'grid-proxy-info -identity' on the client-machine) to the local user 'globus' and see if that works. In the end I have found out that Can't do MLST on non-existing file/dir error message was actually a permission problem: the file permissions were 660 (rw-rw) but the local user, to which the grid user was mapped to, didn't belong to the globus group. Assigning the local user to the globus group fixed the problem in all the machines but one on which I still got that error, despite the user does belong to the globus group. Glad to hear that it works now on most machines. Hard to tell for me why this one machine still causes problems. I hope you can figure it out. Martin
Re: [gt-user] WS_GRAM Stage-out problem
Marco Lackovic wrote: On Sun, May 9, 2010 at 4:56 AM, Martin Feller fel...@mcs.anl.gov wrote: I'm sorry, I meant to say: Try to use the file /tmp/somefile.txt in your fileStageOut element and see if that fails too. /tmp/somefile.txt should have 777 as permissions for sanity check. I tried to change to 777 the permission of the original file I was trying to transfer /home/globus/pippo.xml but still got the same error. Hm, one goal of mine was to get out of /home/globus, to make sure it's not a permission problem on the directory. If that fails: Can you please paste your job description? I am sorry, can you be more specific? I am working on some code which was not written by me and is not documented so I have to figure out things. Do RFT file transfers have job descriptions too? I thought only Gram jobs had. The thread name indicates that this transfer is initialized by a ws-gram job. Is it not?
Re: [gt-user] WS_GRAM Stage-out problem
Ok, my first guess is, that the mapping in the grid-mapfile is not as you think it is, i.e. the DN of the user who submits the transfer request is mapped to another user and not to the user 'globus'. You mentioned that it is, but maybe worth double-checking. If you have control over the machine where the GT server is running and the system is not a production system: create a grid-mapfile with just one mapping of your DN (you can e.g. get your DN by running the command 'grid-proxy-info -identity' on the client-machine) to the local user 'globus' and see if that works. Maybe unlikely, but worth a check: is the directory /home/globus readable for the user globus? Martin Marco Lackovic wrote: On Sun, May 9, 2010 at 2:15 PM, Martin Feller fel...@mcs.anl.gov wrote: Hm, one goal of mine was to get out of /home/globus, to make sure it's not a permission problem on the directory. You were right on this. From the /tmp/ directory the file transferred successfully. What can I do to make it work from /home/globus too?
Re: [gt-user] WS_GRAM Stage-out problem
Marco Lackovic wrote: On Sat, May 8, 2010 at 6:14 AM, Martin Feller fel...@mcs.anl.gov wrote: That seems to be a different error message, if I remember correctly. Not sure if Helmut got a Permission denied. I think the system called failed for him was a No such file or directory, but the Can't do MLST on non-existing file/dir error message was the same as mine. Is the DN of the caller really mapped to the local user globus in the grid-mapfile? Yes, it is. The grid-mapfile passed also the consistency check performed with the command grid-mapfile-check-consistency. What if you try to transfer /tmp/somefile.txt with permissions on /tmp/somefile.txt being 777 (rwxrwxrwx)? I tried to copy it from the machine where I got the error to the caller machine: - with scp it copied and arrived at destionation as 777 (rwxrwxrwx); - with globus-url-copy it copied but arrived at destionation as 644 (rw-r--r--). I'm sorry, I meant to say: Try to use the file /tmp/somefile.txt in your fileStageOut element and see if that fails too. /tmp/somefile.txt should have 777 as permissions for sanity check. If that fails: Can you please paste your job description?
Re: [gt-user] WS_GRAM Stage-out problem
That seems to be a different error message, if I remember correctly. Not sure if Helmut got a Permission denied. Is the DN of the caller really mapped to the local user globus in the grid-mapfile? What if you try to transfer /tmp/somefile.txt with permissions on /tmp/somefile.txt being 777 (rwxrwxrwx)? Martin Marco Lackovic wrote: On Thu, Aug 6, 2009 at 4:03 PM, Martin Feller fel...@mcs.anl.gov wrote: Uh, it's a while ago, but i think i remember this issue. I *thought* it was fixed in 4.0.8, but I created a jar from globus_4_0_branch. It's built using Java 1.4 and you can get it from here: http://www.mcs.anl.gov/~feller/heller/globus_wsrf_rft.jar Can you give it a try by dropping it into ${GLOBUS_LOCATION}/lib, and tell us if it works for you with that jar? I am using GT 4.0.8 and also having that Can't do MLST on non-existing file/dir error message on an actually existing file. I tried substituting the globus_wsrf_rft.jar file in ${GLOBUS_LOCATION}/lib as you suggested but that didn't fix it, I am still getting the same error: 2010-05-07 16:47:16,163 ERROR service.TransferWork [Thread-9,run:401] Terminal transfer error: Can't do MLST on non-existing file/dir /home/globus/pippo.xml on server pluto.paperino.com [Caused by: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: Permission denied 500-A system call failed: Permission denied 500 End.]] Can't do MLST on non-existing file/dir /home/globus/pippo.xml on server pluto.paperino.com. Caused by org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: Permission denied 500-A system call failed: Permission denied 500 End.]. Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Unexpected reply: 500-Command failed : System error in stat: Permission denied 500-A system call failed: Permission denied 500 End. at org.globus.ftp.vanilla.FTPControlChannel.execute(FTPControlChannel.java:412) at org.globus.ftp.FTPClient.mlst(FTPClient.java:598) at org.globus.transfer.reliable.service.cache.SingleConnectionImpl.doMlst(SingleConnectionImpl.java:287) at org.globus.transfer.reliable.service.cache.ThirdPartyConnectionImpl.doMlstOnSource(ThirdPartyConnectionImpl.java:276) at org.globus.transfer.reliable.service.client.ThirdPartyTransferClient.doMlstOnSource(ThirdPartyTransferClient.java:163) at org.globus.transfer.reliable.service.client.ThirdPartyTransferClient.process(ThirdPartyTransferClient.java:101) at org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:379) at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Thread.java:619) The file /home/globus/pippo.xml actually exists. Its details are the following: -rw-rw 1 globus globus 1316 May 7 10:29 pippo.xml
Re: [gt-user] Which service can i use to get all the epr files?
Hi Raffaele, No, such a service does not exist. You have to write that functionality yourself for your services. It shouldn't be too hard to do. If you have a resource home that manages all resources of your service: this could be a place to start. Existing services like RFT, WS-GRAM do not offer that functionality. Martin Raffaele Forgione wrote: Hello everyone, is there a native service in the globus container that may help me to gain all the epr files related to a service? Thanks in advance Condividi le tue emozioni e proteggi la tua privacy. Chiacchiera su Messenger http://www.windowslive.it/importaAmici.aspx
Re: [gt-user] error :gridftp, globus-url-copy
Ok, I'm running out of ideas, but I'd try the following: Build the GT again with a debug flavor (gcc32dbg, gcc64dbg) on hermione if you didn't already do so. Then run grid-cert-diagnostics in gdb and send the output. This will hopefully tell us more about the segfault, which might be related to the gridftp error. Martin Martin Feller wrote: Sunah, Can you send /etc/grid-security/certificates/45fb3f91.0 from both machines to me so that I can try it myself? If I knew another way to solve the problem I'd tell you. Maybe someone from the GridFTP or C security side has more ideas. Martin Sunah Park wrote: Martin, Thanks for your help. I built it from sources on both 2 machines.. and I checked the openssl version of 2 machines are same. # [glo...@harry ~]$ openssl version OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008 # [glo...@hermione ~]$ openssl version OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008 # And /etc/grid-security/certificates/45fb3f91.0 are also the same on harry and hermione. It's too difficult to catch the problems.. Is there another way to solve the problem? Sunah Park. 2010/4/14 Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov Sunah Park, Hm, ok. How did you install the GT on these 2 machines: Did you build it from sources or did you use binary installers? If you built it from binary installers I wonder if maybe the openssl version on hermione is not compatible. What are the openssl versions on these 2 machines? I remember one case where the installation of a binary installer worked fine, the gridftp server started ok, but transfers failed with security related errors, due to an incompatible openssl version. For sanity: Can you double-check that /etc/grid-security/certificates/45fb3f91.0 are really the same on harry and hermione? Martin 박선아 wrote: Hi~ Martin, I'm Cinyoung's coworker and I saw your mails you sent her to solve the problems. Then I did the following works written in your email: * Put all grid security stuff into /etc/grid-security on both machines * Unset all globus security related environment variables on both machines for all users * The content of harry:/etc/grid-security/certificates seems ok, at least grid-cert-diagnostics does not segfault. Copy the content of harry:/etc/grid-security/certificates into hermione:/etc/grid-security/certificates But, it didn't work.. These are output of harry and hermione. ## Harry: root ## *[r...@harry grid-security]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics* Checking Environment Variables == Checking if HOME is set... /root Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1 Checking if X509_CERT_DIR is set... no Checking if X509_USER_CERT is set... no Checking if X509_USER_KEY is set... no Checking if X509_USER_PROXY is set... no Checking if GRIDMAP is set... no Checking Security Directories === Determining trusted cert path... /etc/grid-security/certificates Checking for cog.properties... not found Checking for default gridmap location... /etc/grid-security/grid-mapfile Checking if default gridmap exists... yes Checking trusted certificates... Getting trusted certificate list... Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok Checking that certificate hash matches filename... ok Checking CA certificate name for 45fb3f91.0...ok (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus http://simpleca-harry.sookmyung.ac.kr/CN=Globus http://simpleCA-harry.sookmyung.ac.kr/CN=Globus http://simpleca-harry.sookmyung.ac.kr/CN=Globus Simple CA) Checking if signing policy exists for 45fb3f91.0... ok Verifying certificate chain for 45fb3f91.0... ok ## Harry: user (the user name is /aero/): ## *[a...@harry grid-security]$ $GLOBUS_LOCATION/bin/grid-cert-diagnostics* Checking Environment Variables
Re: [gt-user] error :gridftp, globus-url-copy
Sunah, Can you send /etc/grid-security/certificates/45fb3f91.0 from both machines to me so that I can try it myself? If I knew another way to solve the problem I'd tell you. Maybe someone from the GridFTP or C security side has more ideas. Martin Sunah Park wrote: Martin, Thanks for your help. I built it from sources on both 2 machines.. and I checked the openssl version of 2 machines are same. # [glo...@harry ~]$ openssl version OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008 # [glo...@hermione ~]$ openssl version OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008 # And /etc/grid-security/certificates/45fb3f91.0 are also the same on harry and hermione. It's too difficult to catch the problems.. Is there another way to solve the problem? Sunah Park. 2010/4/14 Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov Sunah Park, Hm, ok. How did you install the GT on these 2 machines: Did you build it from sources or did you use binary installers? If you built it from binary installers I wonder if maybe the openssl version on hermione is not compatible. What are the openssl versions on these 2 machines? I remember one case where the installation of a binary installer worked fine, the gridftp server started ok, but transfers failed with security related errors, due to an incompatible openssl version. For sanity: Can you double-check that /etc/grid-security/certificates/45fb3f91.0 are really the same on harry and hermione? Martin 박선아 wrote: Hi~ Martin, I'm Cinyoung's coworker and I saw your mails you sent her to solve the problems. Then I did the following works written in your email: * Put all grid security stuff into /etc/grid-security on both machines * Unset all globus security related environment variables on both machines for all users * The content of harry:/etc/grid-security/certificates seems ok, at least grid-cert-diagnostics does not segfault. Copy the content of harry:/etc/grid-security/certificates into hermione:/etc/grid-security/certificates But, it didn't work.. These are output of harry and hermione. ## Harry: root ## *[r...@harry grid-security]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics* Checking Environment Variables == Checking if HOME is set... /root Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1 Checking if X509_CERT_DIR is set... no Checking if X509_USER_CERT is set... no Checking if X509_USER_KEY is set... no Checking if X509_USER_PROXY is set... no Checking if GRIDMAP is set... no Checking Security Directories === Determining trusted cert path... /etc/grid-security/certificates Checking for cog.properties... not found Checking for default gridmap location... /etc/grid-security/grid-mapfile Checking if default gridmap exists... yes Checking trusted certificates... Getting trusted certificate list... Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok Checking that certificate hash matches filename... ok Checking CA certificate name for 45fb3f91.0...ok (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus http://simpleca-harry.sookmyung.ac.kr/CN=Globus http://simpleCA-harry.sookmyung.ac.kr/CN=Globus http://simpleca-harry.sookmyung.ac.kr/CN=Globus Simple CA) Checking if signing policy exists for 45fb3f91.0... ok Verifying certificate chain for 45fb3f91.0... ok ## Harry: user (the user name is /aero/): ## *[a...@harry grid-security]$ $GLOBUS_LOCATION/bin/grid-cert-diagnostics* Checking Environment Variables == Checking if HOME is set... /home/aero Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1 Checking if X509_CERT_DIR is set... no Checking if X509_USER_CERT is set... no Checking if X509_USER_KEY is set... no Checking if X509_USER_PROXY is set... no Checking
Re: [gt-user] error :gridftp, globus-url-copy
Sunah Park, Hm, ok. How did you install the GT on these 2 machines: Did you build it from sources or did you use binary installers? If you built it from binary installers I wonder if maybe the openssl version on hermione is not compatible. What are the openssl versions on these 2 machines? I remember one case where the installation of a binary installer worked fine, the gridftp server started ok, but transfers failed with security related errors, due to an incompatible openssl version. For sanity: Can you double-check that /etc/grid-security/certificates/45fb3f91.0 are really the same on harry and hermione? Martin 박선아 wrote: Hi~ Martin, I'm Cinyoung's coworker and I saw your mails you sent her to solve the problems. Then I did the following works written in your email: * Put all grid security stuff into /etc/grid-security on both machines * Unset all globus security related environment variables on both machines for all users * The content of harry:/etc/grid-security/certificates seems ok, at least grid-cert-diagnostics does not segfault. Copy the content of harry:/etc/grid-security/certificates into hermione:/etc/grid-security/certificates But, it didn't work.. These are output of harry and hermione. ## Harry: root ## *[r...@harry grid-security]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics* Checking Environment Variables == Checking if HOME is set... /root Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1 Checking if X509_CERT_DIR is set... no Checking if X509_USER_CERT is set... no Checking if X509_USER_KEY is set... no Checking if X509_USER_PROXY is set... no Checking if GRIDMAP is set... no Checking Security Directories === Determining trusted cert path... /etc/grid-security/certificates Checking for cog.properties... not found Checking for default gridmap location... /etc/grid-security/grid-mapfile Checking if default gridmap exists... yes Checking trusted certificates... Getting trusted certificate list... Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok Checking that certificate hash matches filename... ok Checking CA certificate name for 45fb3f91.0...ok (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus http://simpleCA-harry.sookmyung.ac.kr/CN=Globus Simple CA) Checking if signing policy exists for 45fb3f91.0... ok Verifying certificate chain for 45fb3f91.0... ok ## Harry: user (the user name is /aero/): ## *[a...@harry grid-security]$ $GLOBUS_LOCATION/bin/grid-cert-diagnostics* Checking Environment Variables == Checking if HOME is set... /home/aero Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1 Checking if X509_CERT_DIR is set... no Checking if X509_USER_CERT is set... no Checking if X509_USER_KEY is set... no Checking if X509_USER_PROXY is set... no Checking if GRIDMAP is set... no Checking Security Directories === Determining trusted cert path... /etc/grid-security/certificates Checking for cog.properties... not found Checking for default gridmap location... /home/aero/.gridmap Checking if default gridmap exists... failed globus_sysconfig: File does not exist: /home/aero/.gridmap is not a valid file Checking trusted certificates... Getting trusted certificate list... Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok Checking that certificate hash matches filename... ok Checking CA certificate name for 45fb3f91.0...ok (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus http://simpleCA-harry.sookmyung.ac.kr/CN=Globus Simple CA) Checking if signing policy exists for 45fb3f91.0... ok Verifying certificate chain for 45fb3f91.0... ok ## Hermione: root: ## * [r...@hermione share]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics * Checking Environment Variables == Checking if HOME is set... /root Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1 Checking if X509_CERT_DIR is set... no Checking if X509_USER_CERT is set... no Checking if X509_USER_KEY is set... no Checking
Re: [gt-user] error :gridftp, globus-url-copy
And what's the output of grid-cert-diagnostics on hermione? Martin cinyoung hur wrote: Martin, I run the command $GLOBUS_LOCATION/bin/grid-cert-diagnostic. if X509_CERT_DIR is not set, did it cause problem? Thanks. Regards, Cinyoung Hur. [r...@harry ~]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics Checking Environment Variables == Checking if HOME is set... /root Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1 Checking if X509_CERT_DIR is set... no Checking if X509_USER_CERT is set... no Checking if X509_USER_KEY is set... no Checking if X509_USER_PROXY is set... no Checking if GRIDMAP is set... no Checking Security Directories === Determining trusted cert path... /usr/local/globus-4.2.1.1/share/certificates Checking for cog.properties... not found Checking for default gridmap location... /etc/grid-security/grid-mapfile Checking if default gridmap exists... yes Checking trusted certificates... Getting trusted certificate list... Checking CA file /usr/local/globus-4.2.1.1/share/certificates/45fb3f91.0... ok Checking that certificate hash matches filename... ok Checking CA certificate name for 45fb3f91.0...ok (/O=Grid/OU=GlobusTest/OU=simpleCA-harry..xx.xx/CN=Globus Simple CA) Checking if signing policy exists for 45fb3f91.0... ok Verifying certificate chain for 45fb3f91.0... ok 2010/4/9 Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov Cinyoung, In case that didn't help resolve the issue, you might want to run the command $GLOBUS_LOCATION/bin/grid-cert-diagnostics, which prints pretty helpful information about the grid security setup on a machine. Maybe that helps finding the golden snitch... ;) Martin Lukasz Lacinski wrote: Do you have in the directory hermione:/etc/grid-security/certificates a certificate of the Certificate Authority you used to obtain your user certificate? Please compare /etc/grid-security/certificates on hermione and harry. I looks like you can transfer files between harry and your local machine (file:///path_to_a_file), and only hermione makes problems. Regards, Lukasz On Apr 8, 2010, at 8:22 AM, cinyoung hur wrote: Hello, list. I'm trying to make gridftp work on two nodes, called Hermione and Harry I read other problems in mailing list, someone pointed out clock skew. so, I solved clock skew problems. However, I don't know what my problem is. Could anyone help me with this problem, please? Thank you. Cheers, Cinyoung Hur. - [a...@hermione ~]$ globus-url-copy -dbg gsiftp://hermione..xx.xx/etc/group gsiftp://harry..xx.xx/tmp/from-a debug: starting to size gsiftp://hermione..xx.xx/etc/group debug: connecting to gsiftp://hermione..xx.xx/etc/group debug: response from gsiftp://hermione..xx.xx/etc/group: 220 hermione..xx.xx GridFTP Server 3.15 (gcc32, 1222656151-78) [Globus Toolkit 4.2.1] ready. debug: authenticating with gsiftp://hermione..xx.xx/etc/group debug: response from gsiftp://hermione..xx.xx/etc/group: 530-globus_xio: Authentication Error 530-OpenSSL Error: s3_srvr.c:2490: in library: SSL routines, function SSL3_GET_CLIENT_CERTIFICATE: no certificate returned 530-globus_gsi_callback_module: Could not verify credential 530-globus_gsi_callback_module: Could not verify credential: invalid CA certificate 530 End. debug: fault on connection to gsiftp://hermione..xx.xx/etc/group debug: operation complete debug: starting to transfer gsiftp://hermione..xx.xx/etc/group to gsiftp://harry..xx.xx/tmp/from-a debug: connecting to gsiftp://harry..xx.xx/tmp/from-a debug: response from gsiftp://harry..xx.xx/tmp/from-a: 220 harry..xx.xx GridFTP Server 3.15 (gcc32dbgpthr, 1222656151-78) [Globus Toolkit 4.2.1] ready. debug: authenticating with gsiftp://harry..xx.xx/tmp/from-a debug: response from gsiftp://harry..xx.xx/tmp/from-a: 230 User aero logged in. debug: sending command to gsiftp://harry..xx.xx/tmp/from-a: SITE HELP debug: response from gsiftp://harry..xx.xx/tmp/from-a: 214-The following commands are recognized: ALLOAPPERESTCWD CDUPDCAUEPSVFEAT ERETMDTMSTATESTOHELPLISTMODENLST MLSDPASVRNFRMLSTNOOPOPTSSTORPASS PBSZPORTPROTSITEEPRTRETRSPORSCKS TREVPWD QUITSBUFSIZE
Re: [gt-user] error :gridftp, globus-url-copy
/root/.globus/certificates/45fb3f91.0... ok Checking that certificate hash matches filename... ok Checking CA certificate name for 45fb3f91.0...ok (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.xxx.xx.xx/CN=Globus Simple CA) Checking if signing policy exists for 45fb3f91.0... ok Verifying certificate chain for 45fb3f91.0... ok [r...@harry myproxy]# exit logout Harry: user: - [a...@harry globus]$ $GLOBUS_LOCATION/bin/grid-cert-diagnostics Checking Environment Variables == Checking if HOME is set... /home/aero Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1 Checking if X509_CERT_DIR is set... no Checking if X509_USER_CERT is set... no Checking if X509_USER_KEY is set... no Checking if X509_USER_PROXY is set... no Checking if GRIDMAP is set... no Checking Security Directories === Determining trusted cert path... /etc/grid-security/certificates Checking for cog.properties... not found Checking for default gridmap location... /home/aero/.gridmap Checking if default gridmap exists... failed globus_sysconfig: File does not exist: /home/aero/.gridmap is not a valid file Checking trusted certificates... Getting trusted certificate list... Checking CA file /etc/grid-security/certificates/45fb3f91.0... ok Checking that certificate hash matches filename... ok Checking CA certificate name for 45fb3f91.0...ok (/O=Grid/OU=GlobusTest/OU=simpleCA-harry.sookmyung.ac.kr/CN=Globus http://simpleCA-harry.sookmyung.ac.kr/CN=Globus Simple CA) Checking if signing policy exists for 45fb3f91.0... ok Verifying certificate chain for 45fb3f91.0... ok [a...@harry globus]$ 2010/4/9 Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov And what's the output of grid-cert-diagnostics on hermione? Martin cinyoung hur wrote: Martin, I run the command $GLOBUS_LOCATION/bin/grid-cert-diagnostic. if X509_CERT_DIR is not set, did it cause problem? Thanks. Regards, Cinyoung Hur. [r...@harry ~]# $GLOBUS_LOCATION/bin/grid-cert-diagnostics Checking Environment Variables == Checking if HOME is set... /root Checking if GLOBUS_LOCATION is set... /usr/local/globus-4.2.1.1 Checking if X509_CERT_DIR is set... no Checking if X509_USER_CERT is set... no Checking if X509_USER_KEY is set... no Checking if X509_USER_PROXY is set... no Checking if GRIDMAP is set... no Checking Security Directories === Determining trusted cert path... /usr/local/globus-4.2.1.1/share/certificates Checking for cog.properties... not found Checking for default gridmap location... /etc/grid-security/grid-mapfile Checking if default gridmap exists... yes Checking trusted certificates... Getting trusted certificate list... Checking CA file /usr/local/globus-4.2.1.1/share/certificates/45fb3f91.0... ok Checking that certificate hash matches filename... ok Checking CA certificate name for 45fb3f91.0...ok (/O=Grid/OU=GlobusTest/OU=simpleCA-harry..xx.xx/CN=Globus Simple CA) Checking if signing policy exists for 45fb3f91.0... ok Verifying certificate chain for 45fb3f91.0... ok 2010/4/9 Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov Cinyoung, In case that didn't help resolve the issue, you might want to run the command $GLOBUS_LOCATION/bin/grid-cert-diagnostics, which prints pretty helpful information about the grid security setup on a machine. Maybe that helps finding the golden snitch... ;) Martin Lukasz Lacinski wrote: Do you have in the directory hermione:/etc/grid-security/certificates a certificate of the Certificate Authority you used to obtain your user certificate? Please compare /etc/grid-security/certificates on hermione and harry. I looks like you can transfer files between harry and your local machine (file:///path_to_a_file), and only hermione makes problems. Regards, Lukasz On Apr 8, 2010, at 8:22 AM, cinyoung hur wrote: Hello, list. I'm trying to make gridftp work on two nodes, called Hermione and Harry I read other problems in mailing list, someone pointed out clock skew. so, I solved clock skew problems. However, I don't know what my problem is. Could anyone help me
Re: [gt-user] Gridmap PDP for service
Just a guess: The interceptor element looks a bit different in the docs on http://www.globus.org/toolkit/docs/4.2/4.2.1/security/wsaajava/descriptor/ (It's a containerSecurityDescriptor and not a serviceSecurityDescriptor, but still...) Does interceptor name=gridmapAuthz:org.globus.wsrf.impl.security.GridMapPDP instead of interceptor name=gridmap make it work? -Martin Johannes Duschl wrote: Hello, I'm running gt-4.2.1.1 on Debian Lenny and want to use a separate gridmap-file for a service. The security descriptor looks like this: serviceSecurityConfig xmlns=http://www.globus.org/security/descriptor/service; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.globus.org/security/descriptor name_value_type.xsd xmlns:param=http://www.globus.org/security/descriptor; auth-method GSISecureConversation/ /auth-method authzChain pdps interceptor name=gridmap parameter param:nameValueParam param:parameter name=gridmap-file value=/home/globus/grid-mapfile/ /param:nameValueParam /parameter /interceptor /pdps /authzChain /serviceSecurityConfig However, when I connect to the service I get the following error: org.globus.wsrf.ResourceContextException: ; nested exception is: javax.naming.NamingException: [JWSCORE-203] Bean security initialization failed [Root exception is org.globus.wsrf.config.ConfigException: [JWSSEC-245] Error parsing file: etc/at_jku_tk_service_core/service-instance-security.xml [Caused by: cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element 'param:nameValueParam'.]] Exception in thread main java.lang.NullPointerException I assume there is something wrong with this schema xsi:schemaLocation=http://www.globus.org/security/descriptor name_value_type.xsd but I have no idea what's causing the error. Anybody got a clue? Greetings, Johannes
Re: [gt-user] error :gridftp, globus-url-copy
Cinyoung, In case that didn't help resolve the issue, you might want to run the command $GLOBUS_LOCATION/bin/grid-cert-diagnostics, which prints pretty helpful information about the grid security setup on a machine. Maybe that helps finding the golden snitch... ;) Martin Lukasz Lacinski wrote: Do you have in the directory hermione:/etc/grid-security/certificates a certificate of the Certificate Authority you used to obtain your user certificate? Please compare /etc/grid-security/certificates on hermione and harry. I looks like you can transfer files between harry and your local machine (file:///path_to_a_file), and only hermione makes problems. Regards, Lukasz On Apr 8, 2010, at 8:22 AM, cinyoung hur wrote: Hello, list. I'm trying to make gridftp work on two nodes, called Hermione and Harry I read other problems in mailing list, someone pointed out clock skew. so, I solved clock skew problems. However, I don't know what my problem is. Could anyone help me with this problem, please? Thank you. Cheers, Cinyoung Hur. - [a...@hermione ~]$ globus-url-copy -dbg gsiftp://hermione..xx.xx/etc/group gsiftp://harry..xx.xx/tmp/from-a debug: starting to size gsiftp://hermione..xx.xx/etc/group debug: connecting to gsiftp://hermione..xx.xx/etc/group debug: response from gsiftp://hermione..xx.xx/etc/group: 220 hermione..xx.xx GridFTP Server 3.15 (gcc32, 1222656151-78) [Globus Toolkit 4.2.1] ready. debug: authenticating with gsiftp://hermione..xx.xx/etc/group debug: response from gsiftp://hermione..xx.xx/etc/group: 530-globus_xio: Authentication Error 530-OpenSSL Error: s3_srvr.c:2490: in library: SSL routines, function SSL3_GET_CLIENT_CERTIFICATE: no certificate returned 530-globus_gsi_callback_module: Could not verify credential 530-globus_gsi_callback_module: Could not verify credential: invalid CA certificate 530 End. debug: fault on connection to gsiftp://hermione..xx.xx/etc/group debug: operation complete debug: starting to transfer gsiftp://hermione..xx.xx/etc/group to gsiftp://harry..xx.xx/tmp/from-a debug: connecting to gsiftp://harry..xx.xx/tmp/from-a debug: response from gsiftp://harry..xx.xx/tmp/from-a: 220 harry..xx.xx GridFTP Server 3.15 (gcc32dbgpthr, 1222656151-78) [Globus Toolkit 4.2.1] ready. debug: authenticating with gsiftp://harry..xx.xx/tmp/from-a debug: response from gsiftp://harry..xx.xx/tmp/from-a: 230 User aero logged in. debug: sending command to gsiftp://harry..xx.xx/tmp/from-a: SITE HELP debug: response from gsiftp://harry..xx.xx/tmp/from-a: 214-The following commands are recognized: ALLOAPPERESTCWD CDUPDCAUEPSVFEAT ERETMDTMSTATESTOHELPLISTMODENLST MLSDPASVRNFRMLSTNOOPOPTSSTORPASS PBSZPORTPROTSITEEPRTRETRSPORSCKS TREVPWD QUITSBUFSIZESPASSTRUSYST RNTOTYPEUSERLANGMKD RMD DELECKSM 214 End debug: sending command to gsiftp://harry..xx.xx/tmp/from-a: FEAT debug: response from gsiftp://harry..xx.xx/tmp/from-a: 211-Extensions supported AUTHZ_ASSERT UTF8 LANG EN DCAU PARALLEL SIZE MLST Type*;Size*;Modify*;Perm*;Charset;UNIX.mode*;UNIX.owner*;UNIX.group*;Unique*;UNIX.slink*; ERET ESTO SPAS SPOR REST STREAM MDTM PASV AllowDelayed; 211 End. debug: sending command to gsiftp://harry..xx.xx/tmp/from-a: TYPE I debug: response from gsiftp://harry..xx.xx/tmp/from-a: 200 Type set to I. debug: sending command to gsiftp://harry..xx.xx/tmp/from-a: PBSZ 1048576 debug: response from gsiftp://harry..xx.xx/tmp/from-a: 200 PBSZ=1048576 debug: sending command to gsiftp://harry..xx.xx/tmp/from-a: PASV debug: response from gsiftp://harry..xx.xx/tmp/from-a: 227 Entering Passive Mode (203,153,146,56,137,160) debug: sending command to gsiftp://harry..xx.xx/tmp/from-a: STOR /tmp/from-a debug: sending command to gsiftp://hermione..xx.xx/etc/group: TYPE I debug: response from gsiftp://hermione..xx.xx/etc/group: 530 Must perform GSSAPI authentication. debug: fault on connection to gsiftp://hermione..xx.xx/etc/group debug: operation complete error: globus_ftp_client: the server responded with an error 530 Must perform GSSAPI authentication. [a...@hermione ~]$ -
Re: [gt-user] Help regarding process ID..
Hi, You can only get the process id of your job with ws-gram from GT 4.2.x, but with the GT 4.0 series. We added this parameter as a resource property of a job resource in GT 4.2.0. Example of how to get the local job id: http://www-unix.globus.org/toolkit/docs/latest-stable/execution/gram4/user/#gram4-user-query-single I don't know about Gram 5.0 at the moment. Martin siddharth jain wrote: Hello, I'm able to submit a job to any resource of my choice using the globusrun-ws command. I need to know the process ID of the process executing on the resource for this Job. How can I get this information? Can I get this information at the time of Job submission? Thank you. Yours sincerely, Siddharth Jain
Re: [gt-user] Help regarding process ID..
I meant to say: You can only get the process id of your job with ws-gram from GT 4.2.x, but NOT with the GT 4.0 series. ... Martin Feller wrote: Hi, You can only get the process id of your job with ws-gram from GT 4.2.x, but with the GT 4.0 series. We added this parameter as a resource property of a job resource in GT 4.2.0. Example of how to get the local job id: http://www-unix.globus.org/toolkit/docs/latest-stable/execution/gram4/user/#gram4-user-query-single I don't know about Gram 5.0 at the moment. Martin siddharth jain wrote: Hello, I'm able to submit a job to any resource of my choice using the globusrun-ws command. I need to know the process ID of the process executing on the resource for this Job. How can I get this information? Can I get this information at the time of Job submission? Thank you. Yours sincerely, Siddharth Jain
Re: [gt-user] Doubt about job submition
Lucio, What if you submit the job in batch mode like globusrun-ws -submit -b -o myJob.epr -f job_medidas_sonares.xml and poll for status via globusrun-ws -status -j myJob.epr instead of relying on notification messages. Do you get information about the status of your job then? If so: Could there be a firewall on the client-side that prevents the notifications being sent by the server from reaching the client? Martin Lucio Agostinho Rocha wrote: Hi, I'm using GT4.0.8. I'm trying to submit a job in LOCAL_IP, and receive the response in REMOTE_IP. To do this, I create the following job: job factoryEndpoint xmlns:gram=http://www.globus.org/namespaces/2004/10/gram/job; xmlns:wsa=http://schemas.xmlsoap.org/ws/2004/03/addressing; wsa:Address https://LOCAL_IP:8443/wsrf/services/ManagedJobFactoryService /wsa:Address wsa:ReferenceProperties gram:ResourceIDFork/gram:ResourceID /wsa:ReferenceProperties /factoryEndpoint jobCredentialEndpoint xsi:type=ns1:EndpointReferenceType xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns:ns1=http://schemas.xmlsoap.org/ws/2004/03/addressing; ns1:Address xsi:type=ns1:AttributedURI https://LOCAL_IP:8443/wsrf/services/DelegationService /ns1:Address ns1:ReferenceProperties xsi:type=ns1:ReferencePropertiesType ns1:DelegationKey xmlns:ns1=http://www.globus.org/08/2004/delegationService; 1de27040-0649-11db-a7d0-8fccdff3a60c /ns1:DelegationKey /ns1:ReferenceProperties ns1:ReferenceParameters xsi:type=ns1:ReferenceParametersType/ /jobCredentialEndpoint stagingCredentialEndpoint xsi:type=ns1:EndpointReferenceType xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns:ns1=http://schemas.xmlsoap.org/ws/2004/03/addressing; ns1:Address xsi:type=ns1:AttributedURI https://LOCAL_IP:8443/wsrf/services/DelegationService /ns1:Address ns1:ReferenceProperties xsi:type=ns1:ReferencePropertiesType ns1:DelegationKey xmlns:ns1=http://www.globus.org/08/2004/delegationService; 1de27040-0649-11db-a7d0-8fccdff3a60c /ns1:DelegationKey /ns1:ReferenceProperties ns1:ReferenceParameters xsi:type=ns1:ReferenceParametersType/ /stagingCredentialEndpoint executableglobus_MedidasSonares/executable directory/usr/local/HttpIpthru/API_HttpIpthru_01_03_2010/C++-API/src/directory argumentSID_/argument argumenthttp://127.0.0.1:4951/argument stdout/tmp/stdout/stdout fileStageIn transferCredentialEndpoint xsi:type=ns1:EndpointReferenceType xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns:ns1=http://schemas.xmlsoap.org/ws/2004/03/addressing; ns1:Address xsi:type=ns1:AttributedURI https://LOCAL_IP:8443/wsrf/services/DelegationService /ns1:Address ns1:ReferenceProperties xsi:type=ns1:ReferencePropertiesType ns1:DelegationKey xmlns:ns1=http://www.globus.org/08/2004/delegationService; 1de27040-0649-11db-a7d0-8fccdff3a60c /ns1:DelegationKey /ns1:ReferenceProperties ns1:ReferenceParameters xsi:type=ns1:ReferenceParametersType/ /transferCredentialEndpoint transfer sourceUrlgsiftp://LOCAL_IP/tmp/stdout/sourceUrl destinationUrlgsiftp://REMOTE_IP/home/lucio/processed_job.txt/destinationUrl /transfer /fileStageIn /job Then I execute: $ globusrun-ws -submit -f job_medidas_sonares.xml Submitting job...Done. Job ID: uuid:57d6ebf2-3b45-11df-8e50-001c23c0ceff Termination time: 03/30/2010 15:11 GMT After some time, the message is showed: Current job state: Unsubmitted But no processing is showed after this. What I'm doing wrong? I copied the DelegationKey from a forum, this can be the problem? Some suggestions? Thanks in advance, Lucio .. Veja quais são os assuntos do momento no Yahoo! + Buscados: Top 10 http://br.rd.yahoo.com/mail/taglines/mail/*http://br.maisbuscados.yahoo.com/ - Celebridades http://br.rd.yahoo.com/mail/taglines/mail/*http://br.maisbuscados.yahoo.com/celebridades/ - Música http://br.rd.yahoo.com/mail/taglines/mail/*http://br.maisbuscados.yahoo.com/m%C3%BAsica/ - Esportes http://br.rd.yahoo.com/mail/taglines/mail/*http://br.maisbuscados.yahoo.com/esportes/
[gt-user] test
please ignore
Re: [gt-user] job unsubmitted problem
Marco, No, this message is unrelated to the problem below. What this message indicates is, that the corresponding resource of a job is reloaded in the GT4 container at startup time. The information about a job is not only held in memory, but also persisted to disk. In case of a container shutdown, the in-memory resource goes away, but when you restart it, it'll be reloaded from the persisted data. This ensures that a job is still manageable for a user after a GT4 container restart, like query job status, delete the job. A fix for the problem described below is not in 4.0.8. If you think you run into it, apply the fix as described. Martin Marco Lackovic wrote: Hello, when starting the container I sometimes get the following message: 2010-03-24 11:44:11,609 INFO exec.ManagedExecutableJobHome [Thread-2,recover:207] Recovered resource with ID 3cb225a0-36b6-11df-a43d-bc4c724692c3. I am wondering whether it is related with the following problem and whether it has been then fixed in the version 4.0.8. On Wed, Aug 27, 2008 at 2:55 PM, Martin Feller fel...@mcs.anl.gov wrote: Ok, i think i see it now. You are hitting a combination of generous locking and a potential for an infinite loop in which your container happily cycles. This situation can happen if your job wants to fetch a non-existing credential (probably destroyed earlier) from the delegation service and then, because the credentail does not exist anymore, tries to delete the user proxy file created from that credential earlier, which does not exist either, because it was probably deleted when the credential was destroyed. A not completely uncommon situation i guess, and we handle that badly. I'll have to check how this should be fixed best. This fix should then also find it's ways into the VDT. I'll open a bug for that. A quick fix for you to go on is: Replace -delete) # proxyfile should exist exec rm $PROXYFILE exit $? ;; by -delete) if [ -e $PROXYFILE ]; then exec rm $PROXYFILE exit $? else exit 0 fi ;; in $GLOBUS_LOCATION/libexec/globus-gram-local-proxy-tool (A patch would have been nicer, but i don't know if our versions of that file are the same) I'm quite sure that this solves your problem. Please let me know.
Re: [gt-user] Intermittent errors with GridFTP (GT4.2.1)
Arn wrote: Arn wrote: We've set up GridFTP (4.2.1) on several nodes across our WAN (2 sites) using the quickstart documentation. We are not seeing any issues while transferring large files but when we do a batch transfer (globus-url-copy) with lots of small files (LOSF) then we have problems. The debug/verbose output is the following : error transferring: globus_ftp_client: the server responded with an error 500 500-Command failed. : globus_l_gfs_file_open failed. 500-globus_xio: Unable to open file /path/to/data/losf/small0aAEq8QsYSCJ 500-globus_xio: System error in open: Permission denied 500-globus_xio: A system call failed: Permission denied 500 End. error: There was an error with one or more transfers. --- Note, that this error is intermittent as the same transfer works sometimes. We would appreciate some advice info on what could be the problem and also how to investigate further. On Tue, Feb 23, 2010 at 7:04 AM, Martin Feller fel...@mcs.anl.gov wrote: There is a way to transfer a directory as a single tar-stream, like this: 1. tar up source directory prior to transfer 2. transfer the tar-stream 3. untar the archive on the destination without manual taring/untaring on the client and the server. We implemented this for a community that uses GridFTP heavily for transfers of 42GB sized directories containing 130.000 rather small files in a nested directory structure. It works very reliable this way. The only downside I know is that you cannot use any of the advanced features of GridFTP then, like parallelism: The tar-stream transfers became unreliable. To do this you must enable the popen driver in GridFTP. I recommend the latest server from 5.0.0 plus a GridFTP patch. For the taring on the client-side you can use globus-url-copy using certain flags. We built on top of the jglobus Java API to get it running for Java clients. I could provide more details and instructions if you are interested in this approach. Martin thanks. It looks like a reasonable solution. I will check with my project lead if we can use your suggestion, but we do tend to be wary of using non-standard patches in our production environments. It is not a non-standard patch. It's just that the gridftp developers fixed a popen-driver related issue after 5.0.0 was out. It'll be in GT 5.0.1. Also, we do need to use parallelism but I suppose we can think of a way to turn it on/off depending on the situation. Or maybe we can specify -pp 1 (1 stream) if a LOSF situation is encountered. In any case, do send me the instructions on the method you suggested. Ok, I'll prepare some notes in a few days. Martin Thanks Arn
Re: [gt-user] Default Jobmanager
Hi, There is no such thing as a default jobmanager in ws-gram in the 4.0 series, even if e.g. globusrun-ws seems to pretend there is one: If you don't specify it when you use globusrun-ws, globusrun-ws will use Fork as job manager, and the factory endpoint in the call to ManagedJobFactoryService will actually contain Fork as resource key to the factory endpoint. So from the ManagedJobFactoryService's view all job managers are the same, the client has to specify one in all calls to the ManagedJobFactoryService. If the client does not specify a ResourceID element in the factory endpoint, the request cannot be handled. In the 4.2 series, a default job manager can be configured on the server-side, and if the client does not specify a job manager in the factory endpoint used in the call to the ManagedJobFactoryService, this default job manager will be used. So if you wanted to backport this to the 4.0 series, you'd have to make changes to both clients and the server. Here's a bugzilla entry that describes the changes on the server-side. It doesn't cover the multi job related code details very well though: http://bugzilla.globus.org/globus/show_bug.cgi?id=5744 Martin Löhnhardt, Benjamin wrote: Hi, is it possible to set another jobmanager than fork as default in Globus 4.0.8? I have read here http://www.mail-archive.com/gt-user@lists.globus.org/msg00981.html that it is only possible with code changes. Are these changes documented? Best regards, Benjamin -- Benjamin Löhnhardt UNIVERSITÄTSMEDIZIN GÖTTINGEN GEORG-AUGUST-UNIVERSITÄT Abteilung Medizinische Informatik Robert-Koch-Straße 40 37075 Göttingen Briefpost 37099 Göttingen Telefon +49-551 / 39-22842 benjamin.loehnha...@med.uni-goettingen.de www.mi.med.uni-goettingen.de
Re: [gt-user] yet another Host key verification failed question
Thanks for the feedback, Brian! I'll add a hint about this to my notes. Martin Brian Pratt wrote: OK, I finally cracked the nut. It was indeed an ssh issue, and the missing piece was that the user had to be able to ssh to himself WITHIN THE SAME NODE (!?!). In my case the submitting user is labkey - it's understood that lab...@[clientnode mailto:lab...@[clientnode] needs to be able to ssh to lab...@[headnode mailto:lab...@[headnode] but it turns out he also needs to be able to ssh to lab...@[clientnode mailto:lab...@[clientnode]. This seems odd to me, but that's how it is. I suppose there might be a config tweak for that somewhere. Anyway, I just repeated the steps for establishing ssh trust between lab...@clientnode mailto:lab...@clientnode and lab...@headnode mailto:lab...@headnode for lab...@clientnode mailto:lab...@clientnode and lab...@clientnode mailto:lab...@clientnode and it's all good. One might have guessed that this trust relationship was implicit, but it isn't - you have to add labkey's rsa public key to ~labkey/.ssh/authorized_keys, and update ~labkey/.ssh/known_hosts to include our own hostname. strace -f on the client node was instrumental in figuring this out, as well as messing around in the perl scripts on the server node. ssldump was handy, too. Thanks to Martin and Jim for the pointers. If you're reading this in an effort to solve a similar problem you might be interested to see my scripts for configuring a simple globus+torque cluster on EC2 at https://hedgehog.fhcrc.org/tor/stedi/trunk/AWS_EC2 . Brian On Fri, Dec 4, 2009 at 8:42 AM, Brian Pratt brian.pr...@insilicos.com mailto:brian.pr...@insilicos.com wrote: Martin, Thanks for that tip and the link to some very useful notes. I'd started poking around in that perl module last night and it looks like maybe the problem is actually to do with ssh between agents within the same globus node, so my ssh trust relationships are not yet quite as comprehensive as they need to be. I will certainly post the solution here when I crack the nut. I've found lots of posts out there of folks with similar sounding problems but no resolution, we'll try to fix that here. Of course there are as many ways to go afoul as there are clusters, but we must leave bread crumbs where we can... Brian On Thu, Dec 3, 2009 at 7:05 PM, Martin Feller fel...@mcs.anl.gov mailto:fel...@mcs.anl.gov wrote: Brian, The PBS job manager module is $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/pbs.pm http://pbs.pm/ I remember that I had this or a similar problem once too, but can't seem to find notes about it (sad, i know). Here's some information about the Perl code which is called by the Java pieces of ws-gram to submit the job to the local resource manager. http://www.mcs.anl.gov/~feller/Globus/technicalStuff/Gram/perl/ While this does not help directly, it may help in debugging. If i find my notes or have a good idea I'll let you know. Martin Brian Pratt wrote: Good plan, thanks. Now to figure out where that is.. I'm certainly learning a lot! On Thu, Dec 3, 2009 at 2:01 PM, Jim Basney jbas...@ncsa.uiuc.edu mailto:jbas...@ncsa.uiuc.edu mailto:jbas...@ncsa.uiuc.edu mailto:jbas...@ncsa.uiuc.edu wrote: It's been a long time since I've debugged a problem like this, but the way I did it in the old days was to modify the Globus PBS glue script to dump what it's passing to qsub, so I could reproduce it manually. Brian Pratt wrote: Let me amend that - I do think that this is sniffing around the right tree, which is why I said this is in some ways more of a logging question. It does look very much like an ssh issue, so what what I really need is to figure out exactly what connection parameters were in use for the failue. They seem to be different in some respect than those used in the qsub transactions. What I could really use is a hint at how to lay eyes on that. Thanks, Brian On Thu, Dec 3, 2009 at 1:38 PM, Brian Pratt brian.pr...@insilicos.com mailto:brian.pr...@insilicos.com mailto:brian.pr...@insilicos.com mailto:brian.pr...@insilicos.comwrote: Hi Jim, Thanks for the reply. Unfortunately the answer doesn't seem
Re: [gt-user] WS-GRAM problem
Hi, Did you run make install during the installation? Please send the output of ls -l $GLOBUS_LOCATION/lib/perl/Globus/GRAM/ Martin jr-sim...@criticalsoftware.com wrote: Hi all, I'm having a problem sending jobs in a 4.0.8 Globus installation. I was following the quickstart guide available in the documentation and when trying the WS-GRAM installation using the following command globusrun-ws -submit -c /bin/true, I get the following error: Submitting job...Done. Job ID: uuid:0264967a-da23-11de-860f-ca16a2742cac Termination time: 11/27/2009 00:31 GMT Current job state: Failed Destroying job...Done. globusrun-ws: Job failed: Error code: 201 Script stderr: Can't locate Globus/GRAM/JobDescription.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.7/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.7/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 . /usr/local/globus/lib/perl) at /usr/local/globus/libexec/globus-job-manager-script.pl line 32.BEGIN failed--compilation aborted at /usr/local/globus/libexec/globus-job-manager-script.pl line 32. Everithing in sudoers file is apparently correct. Does anyone ever came across this error, I've done some Globus installations and never seen this. Can someone help me. I've already lost a day around this and not seeing any solution. As a note I am using scientific linux 5.3 and must user GT 4.0.8 If I forgot to mention something you might find usefull, please ask. ;) Cheers Zé Rui
Re: [gt-user] GridFTP: Cannot find gridftp.conf after make gridftp
I think it doesn't exist by default. The server just uses default values if it doesn't exist i think. You can create it in $GLOBUS_LOCATION/etc/ though, and populate it with your parameters, and they should be considered when the server is started. Martin Raffaele Forgione wrote: Hi everyone. I'm installing gridftp from the globus toolkit 4.0.8 . After running make gridftp and everything goes well i search the direcory $GLOBUS_LOCATION/etc and don't find gridftp.conf. It exists neither in /etc/grid-security. Why??? Crea e condividi i tuoi filmati con Movie Maker http://www.windowslive.it/moviemaker.aspx
Re: [gt-user] JGlobus GridFTP Problems
Steffen, I tried it with a simple 2-party transfer using the jglobus API and I can't reproduce what you describe. Try the attached code. Does it show the same results for you? Adjust the values of the variables in TwoPartyTransferMain.java. If you don't have a MyProxy server available, replace the MyProxy code. Compile and run: source ${GLOBUS_LOCATION}/etc/globus-devel-env.sh javac TwoPartyTransferMain.java java TwoPartyTransferMain Martin Steffen Limmer wrote: Hello Martin, thanks for your answer. Does setting the type to binary (Session.TYPE_IMAGE) help? Unfortunately this doesn't help. Regards, Steffen Martin Steffen Limmer wrote: Hello, i want to transfer files with the put method of the org.globus.ftp.GridFTPClient class. Some of the files are self extracting tar archives that contain newline characters in the form of ^M. When i transfer such a file for some reason the ^M will be erased or replaced by \n and so the archiv becomes useless. I tried to copy the files locally with java and everything works fine. So it should not be a problem with the java file i/o. Also with globus-url-copy everything works as expected. Only with the GridFTPClient appears the problem. Has anybody an idea what i can do to fix this? Thanks in advance and regards, Steffen import java.io.File; import java.util.List; import java.util.LinkedList; import java.util.Map; import java.util.HashMap; import java.util.Vector; import org.ietf.jgss.GSSCredential; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.globus.ftp.FileInfo; import org.globus.ftp.GridFTPClient; import org.globus.ftp.Session; public class TwoPartyTransfer { private Log log = LogFactory.getLog(TwoPartyTransfer.class); /** * Download a file from a GridFTP server. */ public void downloadFile( String host, int port, GSSCredential credential, String serverFile, String localFile) throws Exception { GridFTPClient client = null; try { log.debug(Creating GridFTP client); client = this.createClient(host, port, credential); log.debug(Downloading file + serverFile + from + host); this.downloadFile(client, serverFile, localFile); log.debug(Downloaded file + localFile); } finally { this.closeClient(client); } } /** * Upload a file to a GridFTP server. */ public void uploadFile( String host, int port, GSSCredential credential, String serverFile, String localFile) throws Exception { GridFTPClient client = null; try { log.debug(Creating GridFTP client); client = this.createClient(host, port, credential); log.debug(Uploading file + localFile); this.uploadFile(client, localFile, serverFile); log.debug(Uploaded file to + serverFile + on + host); } catch (Exception e) { log.error(Error downloading file., e); throw e; } finally { this.closeClient(client); } } /** * Recursively download the content of a remote directory into a local directory. * The local directory will be created if it does not exist. */ public void downloadDir( String host, int port, GSSCredential credential, String serverDir, String localDir) throws Exception { GridFTPClient listClient = null; GridFTPClient transferClient = null; LinkedListString dirs = new LinkedListString(); HashMapString,String files = new HashMapString,String(); try { log.debug(Create GridFTP client for listing); listClient = this.createClient(host, port, credential); log.debug(Get information of dirs and files from server); this.createInfoFromRemote(listClient, serverDir, localDir, dirs, files); log.debug(Create local directories); this.createLocalDirs(dirs); } finally { this.closeClient(listClient); } try { log.debug(Create GridFTP client for file transfers); transferClient = this.createClient(host, port, credential); log.debug(Download files from server); this.downloadFiles(transferClient, files); } finally { this.closeClient(transferClient); } } /** * Recursively upload the content of a local directory into a remote directory. * The remote directory will be created if it does not exist. */ public void uploadDir( String host, int port, GSSCredential credential, String serverDir, String localDir) throws Exception {
Re: [gt-user] JGlobus GridFTP Problems
Does setting the type to binary (Session.TYPE_IMAGE) help? Like: ... import org.ietf.jgss.GSSCredential; import org.globus.ftp.GridFTPClient; import org.globus.ftp.Session; ... GridFTPClient client = ... client.authenticate(credential); client.setType(Session.TYPE_IMAGE); // do something with the client Martin Steffen Limmer wrote: Hello, i want to transfer files with the put method of the org.globus.ftp.GridFTPClient class. Some of the files are self extracting tar archives that contain newline characters in the form of ^M. When i transfer such a file for some reason the ^M will be erased or replaced by \n and so the archiv becomes useless. I tried to copy the files locally with java and everything works fine. So it should not be a problem with the java file i/o. Also with globus-url-copy everything works as expected. Only with the GridFTPClient appears the problem. Has anybody an idea what i can do to fix this? Thanks in advance and regards, Steffen
Re: [gt-user] problem in the fileStageIn
Please send your job description, or at least the fileStageIn element of your job description. -Martin globus world wrote: i tried telnet 192.168.12.1 57468 The following error came Trying 192.168.12.1... telnet: connect to address 192.168.12.1: Connection refused telnet: Unable to connect to remote host: Connection refused when i tried telnet hostname 8443 it's connected On Wed, Oct 14, 2009 at 7:51 PM, Martin Feller fel...@mcs.anl.gov wrote: Hi, There could be several reasons for the connection problem I think: Is there a gridftp server up and running at 192.168.12.1:57468? You can test that e.g. by telnet 192.168.12.1 57468 192.168 is a private network. Is the GT server you submit the job to located in the same private network and can access the gridftp server on 192.168.12.1:57468? Martin globus world wrote: HI all i am submitting a job to cluster txc.edu through command globusrun-ws -s -S -submit -f RSLsubmit.xml It's giving the following error Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:f30c0a14-b8a8-11de-a6dc-001109bc47f2 Termination time: 10/15/2009 10:04 GMT Current job state: StageIn Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Staging error for RSL element fileStageIn. Non-Extended 3rd-party transfer .txc.org.in:2811/home/sagar/CG_new_lin-- txc.edu:2811/home/griduser/CG_new_lin failed [Caused by: Server refused performing the request. Custom message: (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed. : callback failed. 500-globus_xio: Unable to connect to 192.168.12.1:57468 500-globus_xio: System error in connect: Connection refused 500-globus_xio: A system call failed: Connection refused 500 End.]] Non-Extended 3rd-party transfer txc.org.in:2811/home/sagar/CG_new_lin-- txc.edu:2811/home/griduser/CG_new_lin failed [Caused by: Server refused performing the request. Custom message: (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed. : callback failed. 500-globus_xio: Unable to connect to 192.168.12.1:57468 500-globus_xio: System error in connect: Connection refused 500-globus_xio: A system call failed: Connection refused 500 End.]] please help me Thanks and Regards sagar
Re: [gt-user] problem in the fileStageIn
Hi, There could be several reasons for the connection problem I think: Is there a gridftp server up and running at 192.168.12.1:57468? You can test that e.g. by telnet 192.168.12.1 57468 192.168 is a private network. Is the GT server you submit the job to located in the same private network and can access the gridftp server on 192.168.12.1:57468? Martin globus world wrote: HI all i am submitting a job to cluster txc.edu through command globusrun-ws -s -S -submit -f RSLsubmit.xml It's giving the following error Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:f30c0a14-b8a8-11de-a6dc-001109bc47f2 Termination time: 10/15/2009 10:04 GMT Current job state: StageIn Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Staging error for RSL element fileStageIn. Non-Extended 3rd-party transfer .txc.org.in:2811/home/sagar/CG_new_lin -- txc.edu:2811/home/griduser/CG_new_lin failed [Caused by: Server refused performing the request. Custom message: (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed. : callback failed. 500-globus_xio: Unable to connect to 192.168.12.1:57468 500-globus_xio: System error in connect: Connection refused 500-globus_xio: A system call failed: Connection refused 500 End.]] Non-Extended 3rd-party transfer txc.org.in:2811/home/sagar/CG_new_lin -- txc.edu:2811/home/griduser/CG_new_lin failed [Caused by: Server refused performing the request. Custom message: (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed. : callback failed. 500-globus_xio: Unable to connect to 192.168.12.1:57468 500-globus_xio: System error in connect: Connection refused 500-globus_xio: A system call failed: Connection refused 500 End.]] please help me Thanks and Regards sagar
Re: [gt-user] Regarding security configuration in globus toolkit
Simar, Please reply to the list, and not just to me. Below you pasted this: r...@simar-laptop:~# ls ~/.globus/ simpleCA usercert.pem usercert_request.pem userkey.pem And an ls -l on the same directory gives just simpleCA? This would be a quite untypical output of ls -l. So do you actually have a signed user certificate with a corresponding private key in the .globus directory in the home of the user who ran grid-proxy-init? Martin simar gill wrote: Hi, simpleCA root home is /home/simar Thanks On Mon, Sep 14, 2009 at 3:55 AM, Martin Feller fel...@mcs.anl.gov wrote: Hi, What's the output of ls -l ~/.globus? Is the home of root /home/simar? -Martin simar gill wrote: Hi All I am setting the security by using certificates and proxy. following error are shown: r...@simar-laptop:~# $GLOBUS_LOCATION/bin/grid-proxy-init -debug Error: Couldn't find valid credentials to generate a proxy. grid_proxy_init.c:549: globus_sysconfig: Error with certificate filename: The user cert could not be found in: 1) env. var. X509_USER_CERT 2) $HOME/.globus/usercert.pem 3) $HOME/.globus/usercred.p12 r...@simar-laptop:~# ls ~/.globus/ simpleCA usercert.pem usercert_request.pem userkey.pem r...@simar-laptop:~# ls ~/.globus/usercert.pem /home/simar/.globus/usercert.pem r...@simar-laptop:/etc/grid-security# ls -l total 48 drwxr-xr-x 2 root root 4096 2009-09-13 13:34 certificates -rw-r--r-- 1 globus globus 2670 2009-09-03 15:10 containercert.pem -r 1 globus globus 887 2009-09-03 15:10 containerkey.pem lrwxrwxrwx 1 root root 62 2009-09-13 21:34 globus-host-ssl.conf - /etc/grid-security/certificates//globus-host-ssl.conf.b2bc8b3f lrwxrwxrwx 1 root root 62 2009-09-13 21:34 globus-user-ssl.conf - /etc/grid-security/certificates//globus-user-ssl.conf.b2bc8b3f -rw-r--r-- 1 root root 70 2009-09-07 21:31 grid-mapfile -rw-r--r-- 1 root root 16 2009-09-07 21:31 grid-mapfile.old lrwxrwxrwx 1 root root 60 2009-09-13 21:34 grid-security.conf - /etc/grid-security/certificates//grid-security.conf.b2bc8b3f -rw-r--r-- 1 root root 2670 2009-09-03 15:08 hostcert.pem -rw-r--r-- 1 root root 1363 2009-09-03 15:08 hostcert_request.pem -rw--- 1 root root887 2009-09-03 15:08 hostkey.pem -rw-r--r-- 1 root root 2683 2009-09-13 20:05 hostsigned.pem please tell me the reason of these thanks regards Simar Virk
Re: [gt-user] GramJob Premature End Of File
Hm, can you please try the attached (simple) client and tell if it fails for you with the same error message, too? It works for me with GT 4.2.1. Replace HOST and PORT with appropriate values before you compile it. Build and run (bash): source $GLOBUS_LOCATION/etc/globus-devel-env.sh javac GramClient42.java grid-proxy-init java -DGLOBUS_LOCATION=$GLOBUS_LOCATION GramClient42 -Martin Mosoi Stefan wrote: Hello, I have a problem when trying to launch a gram job in Globus Toolkit 4.2.1 using the code: JobDescriptionType type = new JobDescriptionType(); type.setExecutable(/bin/echo); type.setArgument(new String[]{test}); type.setDirectory(/tmp); type.setStdout(/home/stefan/std.out); type.setStderr(/home/stefan/std.err); type.setJobType(JobTypeEnumeration.single); GramJob crtjob=new GramJob(type); this.crtJob.setCredentials(proxy); this.crtJob.addListener(this); this.crtJob.setAuthorization(NoAuthorization.getInstance()); this.crtJob.submit(factoryEPR, false, true, jobID); I get the following errors : AxisFault faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException faultSubcode: faultString: java.io.IOException: java.io.IOException: java.io.IOException: Non nillable element 'consumerReference' is null. faultActor: faultNode: faultDetail: {http://xml.apache.org/axis/}stackTrace:java.io.IOException: java.io.IOException: java.io.IOException: Non nillable element 'consumerReference' is null. at org.apache.axis.encoding.ser.BeanSerializer.serialize(BeanSerializer.java:288) at org.apache.axis.encoding.SerializationContext.serializeActual(SerializationContext.java:1518) at org.apache.axis.encoding.SerializationContext.serialize(SerializationContext.java:994) at org.apache.axis.encoding.SerializationContext.serialize(SerializationContext.java:815) at org.apache.axis.message.RPCParam.serialize(RPCParam.java:208) at org.apache.axis.message.RPCElement.outputImpl(RPCElement.java:433) at org.apache.axis.message.MessageElement.output(MessageElement.java:1208) at org.apache.axis.message.SOAPBody.outputImpl(SOAPBody.java:139) at org.apache.axis.message.SOAPEnvelope.outputImpl(SOAPEnvelope.java:478) at org.apache.axis.message.MessageElement.output(MessageElement.java:1208) at org.apache.axis.SOAPPart.writeTo(SOAPPart.java:314) at org.apache.axis.SOAPPart.writeTo(SOAPPart.java:268) at org.apache.axis.Message.writeTo(Message.java:539) at org.apache.axis.transport.http.CommonsHTTPSender$MessageRequestEntity.writeRequest(CommonsHTTPSender.java:878) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:495) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:993) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396) at org.apache.axis.transport.http.CommonsHTTPSender.invoke(CommonsHTTPSender.java:224) at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.client.AxisClient.invokeTransport(AxisClient.java:150) at org.apache.axis.client.AxisClient.invoke(AxisClient.java:289) at org.apache.axis.client.Call.invokeEngine(Call.java:2838) at org.apache.axis.client.Call.invoke(Call.java:2824) at org.apache.axis.client.Call.invoke(Call.java:2501) at org.apache.axis.client.Call.invoke(Call.java:2424) at org.apache.axis.client.Call.invoke(Call.java:1835) at org.globus.exec.generated.bindings.ManagedJobFactoryPortTypeSOAPBindingStub.createManagedJob(ManagedJobFactoryPortTypeSOAPBindingStub.java:1644) at org.globus.exec.client.GramJob.createJobEndpoint(GramJob.java:1565) at org.globus.exec.client.GramJob.submit(GramJob.java:495) at jobManagement.impl.JobManager.processCrtJob(JobManager.java:161) at jobManagement.impl.JobManager.run(JobManager.java:103) AxisFault faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException faultSubcode: faultString: org.xml.sax.SAXParseException: Premature end of file. faultActor: faultNode: faultDetail: {http://xml.apache.org/axis/}stackTrace:org.xml.sax.SAXParseException: Premature end of file. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown
Re: [gt-user] gridftp issues (connection refused on control channel)
I don't know what it might be, but I remember that I had it too in large-scale ws-gram tests. Having retries is in general a good idea. If you want to go down to the root of the problem though, I'd recommend sending gridftp server logs in debug mode and a detailed description to gridftp-u...@globus.org (https://lists.globus.org/mailman/listinfo/gridftp-user) Maybe worth testing: Does the same happen if you push the GridFTP servers using globus-url-copy commands with a comparable level of concurrency? Martin Andre Charbonneau wrote: Hi, I was thinking more about this and I was wondering what could be the cause of the failed control channel connections we are seeing when there is 10 concurrent jobs? Maybe if I can track down the source of the connection failures and fix this then my job throughput will be better since the file transfers would not need to be retried. Any thoughts about this? Thanks, Andre Martin Feller wrote: Hi, RFT has a retry mechanism for failing transfers. If you didn't specify a maxAttempts elements in the staging elements of your job description, you can try to add it and see if it helps. maxAttempts specifies how often RFT will try a transfer in case of (transient) transfer errors. It defaults to no retries. You can add this element to fileStageIn, fileStageOut and fileCleanUp: ... fileStageIn maxAttempts10/maxAttempts transfer sourceUrlgsiftp://.../sourceUrl destinationUrlgsiftp://.../destinationUrl /transfer /fileStageIn ... -Martin Andre Charbonneau wrote: Hello, Lately I've been running some benchmarks against a globus resource (gt 4.0.8) here and we are noticing some rft issues when multiple jobs are submitted concurrently. The jobs are simple /bin/hostname jobs, with a small stagein and stageout file in order to involve rft. The jobs are submitted concurrently (to the Fork factory) by a small python script, that forks a thread per globusrun-ws command, and then waits for all the threads to return. Everything looks ok when I submit the jobs one after the other, but when I submit a number of jobs concurrently (10), then I start seing some of the globusrun-ws commands return with an exit code of 255 and the following error message at the client side: globusrun-ws: Job failed: Staging error for RSL element fileStageOut. Connection creation error [Caused by: java.io.EOFException] Connection creation error [Caused by: java.io.EOFException] I could not find anything in the server side container.log. So I enabled debugging at the gridftp level on the server side and I found the following: 2009-08-06 15:08:01,118 DEBUG vanilla.FTPControlChannel [Thread-47,createSocketDNSRR:153] opening control channel to /xxx : 2811 (...) 2009-08-06 15:08:01,180 DEBUG vanilla.Reply [Thread-47,init:65] read 1st line 2009-08-06 15:08:01,807 DEBUG vanilla.Reply [Thread-47,init:68] 1st line: null 2009-08-06 15:08:01,809 DEBUG vanilla.FTPControlChannel [Thread-47,write:363] Control channel sending: QUIT 2009-08-06 15:08:01,810 DEBUG vanilla.FTPControlChannel [Thread-47,close:260] ftp socket closed 2009-08-06 15:08:01,812 DEBUG vanilla.FTPServerFacade [Thread-47,close:340] close data channels 2009-08-06 15:08:01,813 DEBUG vanilla.FTPServerFacade [Thread-47,close:343] close server socket 2009-08-06 15:08:01,813 DEBUG vanilla.FTPServerFacade [Thread-47,stopTaskThread:369] stop master thread 2009-08-06 15:08:01,814 ERROR cache.ConnectionManager [Thread-47,createNewConnection:345] Can't create connection: java.io.EOFException 2009-08-06 15:08:01,820 ERROR service.TransferWork [Thread-47,run:408] Transient transfer error Connection creation error [Caused by: java.io.EOFException] Connection creation error. Caused by java.io.EOFException I not 100% sure that these errors are related, but the Connection creation error. Caused by java.io.EOFException error string makes me think they are. From the gridftp log above, it looks like the control channel connection (port 2811) back to the submit machine (probably for stageout step) fails. In order to debug this, we have tried making the gridftp connection limit much higher in the /etc/inetd.d/gridftp script but that didn't seem to help. We have a port range of 200, which I think should be enough to handle 10 or so concurrent job with one stagein and 2 stageout elements per job. We also experimented with that port range, but with no success. Is this something that anyone experienced before? Maybe there some other configuration that I can change that might fix this issue? Any help or feedback about this is much appreciated. Best regards, Andre
Re: [gt-user] gridway error(donot want gridway with toolkit)
Did you run make install? But when I started setting security certificates after export command I run command to configure ca You need to describe more precisely what you did. From the above one can unfortunately only guess what you did. Martin simar gill wrote: Hi I have insatlled globus toolkit4.2.1 without gridway. But when I started setting security certificates after export command I run command to configure ca.Then following error is shown: glo...@simar-laptop:~/gt4.2.1-all-source-installer$ cat gt-server-ca.log ERROR: Your globus install has not been setup correctly /home/globus/libexec/globus-script-initializer not found You most likely need to run gpt-postinstall for this globus install Due to this I can't work futher. Please help. Regards Simar Virk M.Tech final yr.(doing thesis work) Computer Sci Tech. GNE,Ludhiana
Re: [gt-user] gridway error and javac is not in JAVA_HOME
Try setting the environment variable JAVA_HOME to /usr and not to /path/to/java Then retry the build. If this does not help: Do you need gridway? Martin simar gill wrote: Hi m sending the javac details: which java /usr/bin/java which javac /usr/bin/javac echo $JAVA_HOME /path/to/java whereis java java:/usr/bin/java /etc/java /usr/lib/java /usr/share/java /usr/share/man/manl/java.l.gz whereis javac javac:/usr/bin/java /usr/share/man/manl/javac.l.gz Regards Simar Virk
Re: [gt-user] FailureFileCleanUp problem
Hi, The problem you describe, and which is summarized in the bug you mention, is an architectural problem in WS-GRAM in 4.0. We fixed it in the 4.2 branch. We had to change the interface for this change that's why we can't port it back to the 4.0 branch. If you can upgrade to the 4.2 series I'd recommend this. With 4.0.x there is currently no other way than: 1. Stop the container 2. Delete the problematic job from the persistence directory (by default ~/.globus of the user who runs the container). In your case: remove the file ~containeruser/.globus/hostname-port/ManagedExecutableJobResourceStateType/1748b3d0-8c4b-11de-8543-b8f655c16264.xml 3. Restart the container. -Martin Hazlewood, Victor Gene wrote: Hey GTers, Running WSRF v 4.0.8-r2 on a Cray XT5. Have a user job that looks like it has gone into an unresolvable state and the log file is filling up with messages about not being able to resolve the FailureFileCleanUp state. Anyone have any suggestions how to get rid of this? Have looked at the documentation (nothing I found covers this), looked at bugzilla (http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5247 is close but says it will be fixed in a future release, but gives no instructions how to resolve it currently). I'm running out of ideas. The recurring messages are 2009-08-29 12:40:02,267 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,getInternalState:1666] getting resource datum internalState 2009-08-29 12:40:02,267 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,remove:285] Waiting to be Done or Failed. Current state: FailureFileCleanUp Any help on how to resolve this would be appreciated (besides the it is fixed in the next release type of resolution). Below are the complete job entries for the job. -Victor Victor Hazlewood, CISSP Senior HPC Systems Analyst National Institute for Computational Science University of Tennessee http://www.nics.tennessee.edu/ http://www.nics.utk.edu/ Complete log file entry: 2009-08-28 20:13:32,174 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initialize:142] Entering initialize() 2009-08-28 20:13:32,175 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initialize:147] at super.initialize() 2009-08-28 20:13:32,180 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initialize:153] at initSecurity() 2009-08-28 20:13:32,180 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initSecurity:316] Entering initSecurity() 2009-08-28 20:13:32,182 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initSecurity:338] resource credential subject: 2009-08-28 20:13:32,183 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initSecurity:346] setting resource securty grid map... 2009-08-28 20:13:32,183 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initSecurity:356] Leaving initSecurity() 2009-08-28 20:13:32,186 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initVariableMap:704] GLOBUS_SCRATCH_DIR:${GLOBUS_USER_HOME}/.globus/scratch 2009-08-28 20:13:32,370 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1290] resolving variables in attribute environment 2009-08-28 20:13:32,370 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1295] looking at string ${GLOBUS_USER_HOME} 2009-08-28 20:13:32,370 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1296] found $ at index 0 2009-08-28 20:13:32,371 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1302] found '{'---looks like a reference 2009-08-28 20:13:32,371 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} 2009-08-28 20:13:32,371 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value /nics/c/home/turuncu 2009-08-28 20:13:32,371 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1392] Final string is /nics/c/home/turuncu 2009-08-28 20:13:32,372 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
Re: [gt-user] gt-user Digest, Vol 11, Issue 26
: expected declaration specifiers or '...' before 'size_t' /usr/include/stdlib.h:762: error: expected declaration specifiers or '...' before 'size_t' /usr/include/stdlib.h:766: error: expected declaration specifiers or '...' before 'size_t' /usr/include/stdlib.h:770: error: expected declaration specifiers or '...' before 'size_t' /usr/include/stdlib.h:779: error: expected declaration specifiers or '...' before 'size_t' /usr/include/stdlib.h:782: error: expected ')' before '*' token /usr/include/stdlib.h:786: error: expected declaration specifiers or '...' before 'wchar_t' /usr/include/stdlib.h:790: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'mbstowcs' /usr/include/stdlib.h:793: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'wcstombs' ./common/gw_file_parser.c: In function 'gw_parse_line': ./common/gw_file_parser.c:32: warning: implicit declaration of function 'strdup' ./common/gw_file_parser.c:32: warning: incompatible implicit declaration of built-in function 'strdup' ./common/gw_file_parser.c:34: warning: implicit declaration of function 'strtok_r' ./common/gw_file_parser.c:34: warning: assignment makes pointer from integer without a cast ./common/gw_file_parser.c:38: warning: implicit declaration of function 'strcasecmp' ./common/gw_file_parser.c:40: warning: implicit declaration of function 'strchr' ./common/gw_file_parser.c:40: warning: incompatible implicit declaration of built-in function 'strchr' ./common/gw_file_parser.c: In function 'gw_parse_file': ./common/gw_file_parser.c:74: warning: incompatible implicit declaration of built-in function 'strchr' make[2]: *** [common/__srcdir__drmaa_libdrmaa___GLOBUS_FLAVOR_NAME__la-gw_file_parser.lo] Error 1 make[2]: Leaving directory `/home/nimbus/work/nimbus/gt4.2.1-all-source-installer/source-trees/gridway/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/nimbus/work/nimbus/gt4.2.1-all-source-installer/source-trees/gridway' ERROR: Build has failed make: *** [gridway] Error 2 Regards Simar Virk -- Martin Feller The Globus Alliance Computation Institute at University of Chicago Mathematics Computer Science Division at Argonne National Laboratory Phone: 630 252-4826
Re: [gt-user] problem in the filestageOut
Hi, What version of the GT do you use? If it's 4.0.7: Get http://www.mcs.anl.gov/~feller/heller/globus_wsrf_rft.jar, back up $GLOBUS_LOCATION/lib/globus_wsrf_rft.jar, drop the downloaded file into $GLOBUS_LOCATION/lib, restart the GT server and retry your job. Does that fix it? Martin globus world wrote: Hi i am submitting a job to host fgwu.xtc.in .Problem in the fileStageOut. In my job description i am mentioning about my stdout and stderr files.But when fileStageOut step my stderr file is staging out but stdout file is not staging out . What may be the reason.My job description file as follows job executableMyexe/executable stdout${GLOBUS_USER_HOME}/stdout/stdout stderr${GLOBUS_USER_HOME}/stderr/stderr fileStageIn transfer sourceUrlgsiftp://mypc.org.in:2811/home/sagar/Myexe /sourceUrl destinationUrlfile:///${GLOBUS_USER_HOME}/Myexe/destinationUrl /transfer /fileStageIn fileStageOut transfer sourceUrlgsiftp:// fgwu.xtc.in:2811/${GLOBUS_USER_HOME}/stdout/sourceUrl destinationUrlgsiftp://mypc.org.in:2811/home/sagar/stdout /destinationUrl /transfer transfer sourceUrlgsiftp://fgwu.xtc.in:2811/${GLOBUS_USER_HOME}/stderr /sourceUrl destinationUrlgsiftp://mypc.org.in:2811/home/sagar/stderr /destinationUrl /transfer /fileStageOut //Here in fileStageOut only stderr is staging out but stdout is not staging out fileCleanUp deletion filefile:///${GLOBUS_USER_HOME}/stdout/file /deletion deletion filefile:///${GLOBUS_USER_HOME}/stderr/file /deletion deletion filefile:///${GLOBUS_USER_HOME}/Myexe/file /deletion /fileCleanUp /job The error is Current job state: StageOut Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Staging error for RSL element fileStageOut. Can't do MLST on non-existing file/dir /home/sagar/stdout on server mypc.org.in [Caused by: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End.]] Can't do MLST on non-existing file/dir /home/sagar/stdout on server mypc.org.in [Caused by: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End.]] Thanks and Regards sagar
Re: [gt-user] gt-user Digest, Vol 11, Issue 20
Hi Simar, simar gill wrote: Hi All I have ubuntu 32-bit installed in dual boot with window vista. I have installed all the prequisite softwares. Now want to set globus location environment but it does not work If I type in terminal the followinig: $ export GLOBUS_LOCATION=path/to/install $ cd $GLOBUS_LOCATION Error : directory does not exist. In Vol 7, Issue 13 it says: If the directory does not exist, create it. $ ./configure --prefix=/home/globus/globus-4.2.1.1 bash: command not found If I run it as Globus user then it will give Error:permission denied And further down the page: Make sure the directory pointed to by the environment variable $GLOBUS_LOCATION belongs to user globus and has the right permissions, e.g. drwxr-xr-x. Then try configure again as user globus. Does it work better then? If not: More detailed descriptions about directories and error messages will be helpful. -Martin
Re: [gt-user] Globus 4.0.7 with Loadleveler Integration
We don't officially support LL, but a group on TeraGrid used LoadLeveler too. I assume it's a problem in $GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml Please send the content of that file. -Martin Asish M Madhu wrote: Dear All, I was trying to integrate GLOBUS 4.0.7 with Loadleveler (version:3) in AIX 5.2 64 bit machine. I have installed GLOBUS 4.0.7 successfully using 64 bit flavour. For integrating with Loadleveler i used the llgrid.tar file from the already installed Loadleveler. Untared it and run the deploy.sh executable which will automatically integrate Globus with existing Loadleveler. The installation of the integration package didnt throw any error . But after this integration package is installed I cant start the Globus Container. I get the below error in the container.log file: vi $GLOBUS_LOCATION/var/container.log */usr/local/GARUDA/GLOBUS-4.0.7/var/container.log 2 lines, 264 characters Failed to start container: Failed to initialize 'ManagedJobFactoryService' service [Caused by: ; nested exception is: javax.naming.NamingException: Bean initialization failed [Root exception is java.lang.RuntimeException: java.lang.NumberFormatException: null]]* ~ What could be the problem? The integration package creates the below file in Globus path. $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/loadleveler.pm $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml $GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml $GLOBUS_LOCATION/etc/grid-services/jobmanager-loadleveler $GLOBUS_LOCATION/etc/globus-loadleveler.conf $GLOBUS_LOCATION/libexec/globus-scheduler-provider-loadleveler But when i remove these files the container starts without any problem. How can I integrate Globus with Loadleveler ? Kindly help me. Thanks in advance Regards Asish M Madhu asis...@gmail.com
Re: [gt-user] Globus 4.0.7 with Loadleveler Integration
Hi Replace it by this, but substitute ${GLOBUS_LOCATION} with the value of the environment variable ${GLOBUS_LOCATION} ?xml version=1.0 encoding=UTF-8? jndiConfig xmlns=http://wsrf.globus.org/jndi/config; !-- Configuration delta (addition) fpr a Local Resource Manager -- !-- Configuration for Managed Job *Factory* Service -- service name=ManagedJobFactoryService !-- LRM configuration: Fork -- resource name=ForkResourceConfiguration type=org.globus.exec.service.factory.FactoryResourceConfiguration resourceParams parameter name factory /name value org.globus.wsrf.jndi.BeanFactory /value /parameter parameter name localResourceManagerName /name value Loadleveler /value /parameter !-- Site-specific scratchDir Default: ${GLOBUS_USER_HOME}/.globus/scratch parameter name scratchDirectory /name value ${GLOBUS_USER_HOME}/.globus/scratch /value /parameter -- parameter name substitutionDefinitionsFile /name value ${GLOBUS_LOCATION}/etc/gram-service-Fork/substitution-definition.properties /value /parameter parameter name substitutionDefinitionsRefreshPeriod /name value !-- MINUTES -- 480 /value /parameter parameter name enableDefaultSoftwareEnvironment /name value false /value /parameter /resourceParams /resource /service /jndiConfig -Martin Asish M Madhu wrote: Hello Martin, Please find the content of jndi-config.xml ($GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml) - ?xml version=1.0 encoding=UTF-8? jndiConfig xmlns=http://wsrf.globus.org/jndi/config; !-- Configuration delta (addition) fpr a Local Resource Manager -- !-- Configuration for Managed Job *Factory* Service -- service name=ManagedJobFactoryService !-- LRM configuration: Loadleveler -- resource name=LoadlevelerResourceConfiguration type=org.globus.exec.service.factory.FactoryResourceConfiguration resourceParams parameter name factory /name value org.globus.wsrf.jndi.BeanFactory /value /parameter parameter name localResourceManagerName /name value Loadleveler /value /parameter parameter name scratchDirectory /name value /.globus/scratch /value /parameter /resourceParams /resource /service /jndiConfig -- Thanking you Regards Asish On Tue, Aug 25, 2009 at 5:03 PM, Martin Feller fel...@mcs.anl.gov wrote: We don't officially support LL, but a group on TeraGrid used LoadLeveler too. I assume it's a problem in $GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml Please send the content of that file. -Martin Asish M Madhu wrote: Dear All, I was trying to integrate GLOBUS 4.0.7 with Loadleveler (version:3) in AIX 5.2 64 bit machine. I have installed GLOBUS 4.0.7 successfully using 64 bit flavour. For integrating with Loadleveler i used the llgrid.tar file from the already installed Loadleveler. Untared it and run the deploy.sh executable which will automatically integrate Globus with existing Loadleveler. The installation of the integration package didnt throw any error . But after this integration package is installed I cant start the Globus Container. I get the below error
Re: [gt-user] Globus 4.0.7 with Loadleveler Integration
you Asish On Tue, Aug 25, 2009 at 5:18 PM, Martin Feller fel...@mcs.anl.gov wrote: Hi Replace it by this, but substitute ${GLOBUS_LOCATION} with the value of the environment variable ${GLOBUS_LOCATION} ?xml version=1.0 encoding=UTF-8? jndiConfig xmlns=http://wsrf.globus.org/jndi/config; !-- Configuration delta (addition) fpr a Local Resource Manager -- !-- Configuration for Managed Job *Factory* Service -- service name=ManagedJobFactoryService !-- LRM configuration: Fork -- resource name=ForkResourceConfiguration type=org.globus.exec.service.factory.FactoryResourceConfiguration resourceParams parameter name factory /name value org.globus.wsrf.jndi.BeanFactory /value /parameter parameter name localResourceManagerName /name value Loadleveler /value /parameter !-- Site-specific scratchDir Default: ${GLOBUS_USER_HOME}/.globus/scratch parameter name scratchDirectory /name value ${GLOBUS_USER_HOME}/.globus/scratch /value /parameter -- parameter name substitutionDefinitionsFile /name value ${GLOBUS_LOCATION}/etc/gram-service-Fork/substitution-definition.properties /value /parameter parameter name substitutionDefinitionsRefreshPeriod /name value !-- MINUTES -- 480 /value /parameter parameter name enableDefaultSoftwareEnvironment /name value false /value /parameter /resourceParams /resource /service /jndiConfig -Martin Asish M Madhu wrote: Hello Martin, Please find the content of jndi-config.xml ($GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml) - ?xml version=1.0 encoding=UTF-8? jndiConfig xmlns=http://wsrf.globus.org/jndi/config; !-- Configuration delta (addition) fpr a Local Resource Manager -- !-- Configuration for Managed Job *Factory* Service -- service name=ManagedJobFactoryService !-- LRM configuration: Loadleveler -- resource name=LoadlevelerResourceConfiguration type=org.globus.exec.service.factory.FactoryResourceConfiguration resourceParams parameter name factory /name value org.globus.wsrf.jndi.BeanFactory /value /parameter parameter name localResourceManagerName /name value Loadleveler /value /parameter parameter name scratchDirectory /name value /.globus/scratch /value /parameter /resourceParams /resource /service /jndiConfig -- Thanking you Regards Asish On Tue, Aug 25, 2009 at 5:03 PM, Martin Feller fel...@mcs.anl.gov wrote: We don't officially support LL, but a group on TeraGrid used LoadLeveler too. I assume it's a problem in $GLOBUS_LOCATION/etc/gram-service-Loadleveler/jndi-config.xml Please send the content of that file. -Martin Asish M Madhu wrote: Dear All, I was trying to integrate GLOBUS 4.0.7 with Loadleveler (version:3) in AIX 5.2 64 bit machine. I have installed GLOBUS 4.0.7 successfully using 64 bit flavour. For integrating with Loadleveler i used the llgrid.tar file from the already installed Loadleveler. Untared it and run the deploy.sh executable which will automatically integrate Globus with existing Loadleveler. The installation of the integration package didnt throw any error . But after this integration package is installed I cant start
Re: [gt-user] gridftp issues (connection refused on control channel)
Hi, RFT has a retry mechanism for failing transfers. If you didn't specify a maxAttempts elements in the staging elements of your job description, you can try to add it and see if it helps. maxAttempts specifies how often RFT will try a transfer in case of (transient) transfer errors. It defaults to no retries. You can add this element to fileStageIn, fileStageOut and fileCleanUp: ... fileStageIn maxAttempts10/maxAttempts transfer sourceUrlgsiftp://.../sourceUrl destinationUrlgsiftp://.../destinationUrl /transfer /fileStageIn ... -Martin Andre Charbonneau wrote: Hello, Lately I've been running some benchmarks against a globus resource (gt 4.0.8) here and we are noticing some rft issues when multiple jobs are submitted concurrently. The jobs are simple /bin/hostname jobs, with a small stagein and stageout file in order to involve rft. The jobs are submitted concurrently (to the Fork factory) by a small python script, that forks a thread per globusrun-ws command, and then waits for all the threads to return. Everything looks ok when I submit the jobs one after the other, but when I submit a number of jobs concurrently (10), then I start seing some of the globusrun-ws commands return with an exit code of 255 and the following error message at the client side: globusrun-ws: Job failed: Staging error for RSL element fileStageOut. Connection creation error [Caused by: java.io.EOFException] Connection creation error [Caused by: java.io.EOFException] I could not find anything in the server side container.log. So I enabled debugging at the gridftp level on the server side and I found the following: 2009-08-06 15:08:01,118 DEBUG vanilla.FTPControlChannel [Thread-47,createSocketDNSRR:153] opening control channel to /xxx : 2811 (...) 2009-08-06 15:08:01,180 DEBUG vanilla.Reply [Thread-47,init:65] read 1st line 2009-08-06 15:08:01,807 DEBUG vanilla.Reply [Thread-47,init:68] 1st line: null 2009-08-06 15:08:01,809 DEBUG vanilla.FTPControlChannel [Thread-47,write:363] Control channel sending: QUIT 2009-08-06 15:08:01,810 DEBUG vanilla.FTPControlChannel [Thread-47,close:260] ftp socket closed 2009-08-06 15:08:01,812 DEBUG vanilla.FTPServerFacade [Thread-47,close:340] close data channels 2009-08-06 15:08:01,813 DEBUG vanilla.FTPServerFacade [Thread-47,close:343] close server socket 2009-08-06 15:08:01,813 DEBUG vanilla.FTPServerFacade [Thread-47,stopTaskThread:369] stop master thread 2009-08-06 15:08:01,814 ERROR cache.ConnectionManager [Thread-47,createNewConnection:345] Can't create connection: java.io.EOFException 2009-08-06 15:08:01,820 ERROR service.TransferWork [Thread-47,run:408] Transient transfer error Connection creation error [Caused by: java.io.EOFException] Connection creation error. Caused by java.io.EOFException I not 100% sure that these errors are related, but the Connection creation error. Caused by java.io.EOFException error string makes me think they are. From the gridftp log above, it looks like the control channel connection (port 2811) back to the submit machine (probably for stageout step) fails. In order to debug this, we have tried making the gridftp connection limit much higher in the /etc/inetd.d/gridftp script but that didn't seem to help. We have a port range of 200, which I think should be enough to handle 10 or so concurrent job with one stagein and 2 stageout elements per job. We also experimented with that port range, but with no success. Is this something that anyone experienced before? Maybe there some other configuration that I can change that might fix this issue? Any help or feedback about this is much appreciated. Best regards, Andre
Re: [gt-user] GRAM jobs dying after 24 hours
Yuriy wrote: Cannot reproduce it anymore... I submitted jobs with/without delegation, with/without streaming, with globus-delegate for credential and without, and none of them were killed... In fact I cannot see any user jobs dying for about a week now. Maybe it is related to the state of the container? Is there anything in the logs that could indicate the moment that some credential was removed and the reason for it? By default no. You can set the log level for the delegation service to debug (log4j.category.org.globus.delegation.service=DEBUG in $GLOBUS_LOCATION/container-log4j.properties), and the log tells you then that a delegation resource is being destroyed, but unfortunately it does not tell you the id/name of the resource. As far as I know the reason for removal can be: - explicit call to destroy by a client - a client/service tries to access the credential and it is expired. I think there's no general periodical sweep and destroy if expired for persisted delegation resources. The persisted/../DelegationResource/ folder (this is where credentials are stored, right?) right. contains 1200 files, most of the related jobs are probably dead. Is there any way to decider those files and see what is inside? Delegated credentials are serialized Java objects (DelegationResource objects). I attached a small program that reads all serialized delegated credentials from the persistence directory and prints information about it. Point the variable persistenceDirName to the persistence directory of the delegated credentials before you compile it. Compile it: - source ${GLOBUS_LOCATION}/etc/globus-devel-env.sh (assuming bash/bourne shell) - javac CheckDelegationResources.java (assuming java 1.4+) Run it: - java CheckDelegationResource This program won't win a beauty contest, extend it as you need it. Hope this helps. -Martin Cheers, Yuriy On Mon, Aug 10, 2009 at 08:24:35AM -0500, Martin Feller wrote: What very probably happens is that a credential being delegated to the server expired. It's being removed on the server-side in that case and jobs that still refer to such a (no longer existing) credential fail with the error message you pasted. How do you delegate the credential that is being used by jobs: * Do you let globusrun-ws delegate for you? * Do you delegate a credential, e.g. using globus-credential-delegate and refer to the credential in your job description or let globusrun-ws pick up the epr of the manually delegated credential? You can debug this e.g. like this: * Submit jobs that do not require a delegated credential and see if the same problem still occurs. From your description I'd say that those jobs will not fail. * Delegate a credential that is valid for, say, 60h, using globus-credential-delegate and refer to that credential in your jobs. (globusrun-ws options: -Jf, -Sf) and check if the jobs still fail after 24h. Maybe worth noting: sometimes people delegate although they don't really need to delegate, i.e. the job does not need a job credential and no staging is performed. -Martin Yuriy wrote: Hi, Some of the jobs submitted to torque via GRAM are killed after about 24 hours in the queue, all with the similar message in globus logs: 2009-07-10 11:32:16,052 INFO exec.StateMachine [RunQueueThread_5,logJobFailed:3250] Job 74bd3c60-6c17-11de-9a06-9ba1d1ebd14a failed. Description: Couldn't obtain a delegated credential. Cause: org.globus.exec.generated.FaultType: Couldn't obtain a delegated credential. caused by [0: org.oasis.wsrf.faults.BaseFaultType: Error getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException]] torque reports exit status = 271 (exceeds resource limit or killed by user), none of the problematic jobs seem to exceed any limits. Moreover we had a lot of jobs that run for longer then 24 hours and completed successfully (sometimes users just re-submitted jobs with the same description and using exactly the same tools and it completed without any problems). All problematic jobs were submitted with globusrun-ws tool Could anyone explain what is going on here? Currently we use globus version from VDT 1.10, started with VDT 1.6 From looking in logs, we had the same problem for over a year, but not many people are affected and most just re-submit without reporting. Cheers, Yuriy import java.io.File; import java.io.FileInputStream; import java.io.ObjectInputStream; import java.util.Calendar; import java.util.Date; public class CheckDelegationResources { public static void main(String[] args) throws Exception { // Fill in path to persistence directory of delegated credentials String persistenceDirName = ; File persistenceDir = new File(persistenceDirName); if (persistenceDir.exists()) { String[] resources
Re: [gt-user] Installation problem of GT 4.2.1
or '...' before 'size_t' /usr/include/stdlib.h:766: error: expected declaration specifiers or '...' before 'size_t' /usr/include/stdlib.h:770: error: expected declaration specifiers or '...' before 'size_t' /usr/include/stdlib.h:779: error: expected declaration specifiers or '...' before 'size_t' /usr/include/stdlib.h:782: error: expected ')' before '*' token /usr/include/stdlib.h:786: error: expected declaration specifiers or '...' before 'wchar_t' /usr/include/stdlib.h:790: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'mbstowcs' /usr/include/stdlib.h:793: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'wcstombs' ./common/gw_file_parser.c: In function 'gw_parse_line': ./common/gw_file_parser.c:32: warning: implicit declaration of function 'strdup' ./common/gw_file_parser.c:32: warning: incompatible implicit declaration of built-in function 'strdup' ./common/gw_file_parser.c:34: warning: implicit declaration of function 'strtok_r' ./common/gw_file_parser.c:34: warning: assignment makes pointer from integer without a cast ./common/gw_file_parser.c:38: warning: implicit declaration of function 'strcasecmp' ./common/gw_file_parser.c:40: warning: implicit declaration of function 'strchr' ./common/gw_file_parser.c:40: warning: incompatible implicit declaration of built-in function 'strchr' ./common/gw_file_parser.c: In function 'gw_parse_file': ./common/gw_file_parser.c:74: warning: incompatible implicit declaration of built-in function 'strchr' make[2]: *** [common/__srcdir__drmaa_libdrmaa___GLOBUS_FLAVOR_NAME__la-gw_file_parser.lo] Error 1 make[2]: Leaving directory `/home/nimbus/work/nimbus/gt4.2.1-all-source-installer/source-trees/gridway/ src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/nimbus/work/nimbus/gt4.2.1-all-source-installer/source-trees/gridway' ERROR: Build has failed make: *** [gridway] Error 2 nim...@ubuntu:~/work/nimbus/gt4.2.1-all-source-installer$ kindest regards Wei -- Martin Feller The Globus Alliance Computation Institute at University of Chicago Mathematics Computer Science Division at Argonne National Laboratory Phone: 630 252-4826
Re: [gt-user] GRAM jobs dying after 24 hours
What very probably happens is that a credential being delegated to the server expired. It's being removed on the server-side in that case and jobs that still refer to such a (no longer existing) credential fail with the error message you pasted. How do you delegate the credential that is being used by jobs: * Do you let globusrun-ws delegate for you? * Do you delegate a credential, e.g. using globus-credential-delegate and refer to the credential in your job description or let globusrun-ws pick up the epr of the manually delegated credential? You can debug this e.g. like this: * Submit jobs that do not require a delegated credential and see if the same problem still occurs. From your description I'd say that those jobs will not fail. * Delegate a credential that is valid for, say, 60h, using globus-credential-delegate and refer to that credential in your jobs. (globusrun-ws options: -Jf, -Sf) and check if the jobs still fail after 24h. Maybe worth noting: sometimes people delegate although they don't really need to delegate, i.e. the job does not need a job credential and no staging is performed. -Martin Yuriy wrote: Hi, Some of the jobs submitted to torque via GRAM are killed after about 24 hours in the queue, all with the similar message in globus logs: 2009-07-10 11:32:16,052 INFO exec.StateMachine [RunQueueThread_5,logJobFailed:3250] Job 74bd3c60-6c17-11de-9a06-9ba1d1ebd14a failed. Description: Couldn't obtain a delegated credential. Cause: org.globus.exec.generated.FaultType: Couldn't obtain a delegated credential. caused by [0: org.oasis.wsrf.faults.BaseFaultType: Error getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException]] torque reports exit status = 271 (exceeds resource limit or killed by user), none of the problematic jobs seem to exceed any limits. Moreover we had a lot of jobs that run for longer then 24 hours and completed successfully (sometimes users just re-submitted jobs with the same description and using exactly the same tools and it completed without any problems). All problematic jobs were submitted with globusrun-ws tool Could anyone explain what is going on here? Currently we use globus version from VDT 1.10, started with VDT 1.6 From looking in logs, we had the same problem for over a year, but not many people are affected and most just re-submit without reporting. Cheers, Yuriy
Re: [gt-user] WS_GRAM Stage-out problem
Hi Helmut, Uh, it's a while ago, but i think i remember this issue. I *thought* it was fixed in 4.0.8, but I created a jar from globus_4_0_branch. It's built using Java 1.4 and you can get it from here: http://www.mcs.anl.gov/~feller/heller/globus_wsrf_rft.jar Can you give it a try by dropping it into ${GLOBUS_LOCATION}/lib, and tell us if it works for you with that jar? -Martin Helmut Heller wrote: Hello Martin, We run GT4.0.8 but we are encountering the same error that Sergey describes. Unfortunately, the link you give below no longer works. Can you please point me to a globus_wsrf_rft.jar for GT4.0.8? Thanks a lot in advance, Helmut On 14.05.2008, at 15:12, Martin Feller wrote: Sergey, You are probably using GT 4.0.6 or GT 4.0.7, right? Unfortunately we don't have an update package for that problem yet, but you can download an updated RFT jar from here: http://www-unix.mcs.anl.gov/~feller/calebe/globus_wsrf_rft.jar To install it copy it to $GLOBUS_LOCATION/lib. After a GT server restart the problem should go away. Martin - Original Message - From: S.Kulanov s.kula...@mail.ru To: Globus gt-u...@globus.org Sent: Wednesday, May 14, 2008 1:44:43 AM GMT -06:00 US/Canada Central Subject: [gt-user] WS_GRAM Stage-out problem Good day, I have some problems with StageOut while using example from http://www.globus.org/toolkit/docs/4.0/execution/wsgram/user-index.html#s-wsgram-user-usagescenarios I have hosts: CA and hosta. Checking GridFTP copy CAhosta: [kula...@ca ~]$ globus-url-copy -dbg gsiftp://ca.kulanov.org.ua:2811/bin/echo gsiftp://hosta.kulanov.org.ua:2811/tmp/my_echo . debug: response from gsiftp://ca.kulanov.org.ua:2811/bin/echo: 150 Begining transfer. debug: response from gsiftp://hosta.kulanov.org.ua:2811/tmp/my_echo: 150 Begining transfer. debug: response from gsiftp://ca.kulanov.org.ua:2811/bin/echo: 226 Transfer Complete. debug: response from gsiftp://hosta.kulanov.org.ua:2811/tmp/my_echo: 226 Transfer Complete. debug: operation complete [kula...@ca ~]$ everything works fine Now I' like to test WSGRAM: Here is the job description file: ==BEGIN=== job executablemy_echo/executable directory${GLOBUS_USER_HOME}/directory argumentHello/argument argumentWorld!/argument stdout${GLOBUS_USER_HOME}/stdout/stdout stderr${GLOBUS_USER_HOME}/stderr/stderr fileStageIn transfer sourceUrlgsiftp://hosta.kulanov.org.ua:2811/bin/echo/sourceUrl destinationUrlgsiftp://ca.kulanov.org.ua:2811/${GLOBUS_USER_HOME}/my_echo/destinationUrl /transfer /fileStageIn fileStageOut transfer sourceUrlgsiftp://ca.kulanov.org.ua:2811/${GLOBUS_USER_HOME}/stdout/sourceUrl destinationUrlgsiftp://ca.kulanov.org.ua:2811/tmp/stdout/destinationUrl /transfer /fileStageOut fileCleanUp deletion filefile:///${GLOBUS_USER_HOME}/my_echo/file /deletion /fileCleanUp /job =END = As you can see I just StageOUT on the same host - CA [kula...@ca ~]$ globusrun-ws -submit -S -f test.xml Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:ca1f3cc2-2199-11dd-b418-000c29da9a67 Termination time: 05/15/2008 09:40 GMT Current job state: StageIn Current job state: Active Current job state: StageOut Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. [kula...@ca ~]$ everything works fine Now we just change the StageOUT section to, so StageOUT will point to hosta: fileStageOut transfer sourceUrlgsiftp://ca.kulanov.org.ua:2811/${GLOBUS_USER_HOME}/stdout/sourceUrl destinationUrlgsiftp://hosta.kulanov.org.ua:2811/tmp/stdout/destinationUrl /transfer /fileStageOut [kula...@ca ~]$ globusrun-ws -submit -S -f test.xml Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:94307f62-219a-11dd-aeb6-000c29da9a67 Termination time: 05/15/2008 09:46 GMT Current job state: StageIn Current job state: Active Current job state: StageOut Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Staging error for RSL element fileStageOut. Can't do MLST on non-existing file/dir /home/kulanov/stdout on server hosta.kulanov.org.ua [Caused by: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End.]] Can't do MLST on non-existing file/dir /home/kulanov/stdout on server hosta.kulanov.org.ua [Caused by: Server refused performing the request. Custom message: Server refused MLST
Re: [gt-user] Error while starting the container
By default GT starts to listen on port 8443. It seems that this port is already taken by another application (maybe Tomcat, or another instance of GT already running?) You can specify another port using the -p, like globus-start-container -p 8445 -Martin Manisha Lakra wrote: Hello, For installing WS-GRAM, I am starting the globus container as given in the System Administrator's Guide for WS-GRAM. For this, I used the following command: $GLOBUS_LOCATION/bin/globus-start-container I get the following error, for the above command: [JWSCORE-114] Failed to start container: [JWSCORE-200] Container failed to initialize [Caused by: Address already in use] I tried to change the IP address for my system, but still the same error persist. Can anyone tell me what is the problem? Thanks Regards, Manisha Lakra
Re: [gt-user] How to set default local resource manager in gt4.0?
Hi, This feature is not supported in the 4.0 series. It's not something that can be configured, but it required code changes. -Martin Prashanth Chengi wrote: Dear all, On our site, we are running gt4.0.8. We want to disable fork and set PBS as the default local resource manager. We were able to find documentation to do so in gt4.2 but not gt4.0. Any suggestions on how we can implement it on gt4.0? Thanks and Regards, Prashanth Chengi National PARAM SuperComputing Facility, System Administration and Networking Group, C-DAC Pune. Ext-183 Mob: 09766044870 Courage is the resistance to fear, mastery of fear, Not the absence of fear -Mark Twain
Re: [gt-user] How to set default local resource manager in gt4.0?
Hi, If it is only about disabling Fork: http://tinyurl.com/czo6kb All jobs that go to Fork will then result in an error: 'The Managed Job Factory Service at https://dadada:8443/wsrf/services/ManagedJobFactoryService does not have a resource with key Fork.' This does not yet set PBS as default local resource manager though. Is this enough? Otherwise we'd need to hack. -Martin Prashanth Chengi wrote: Thanks for the info! I was hunting high and low for documentation for that! Any crude hacks that could us help achieve that? We want PBS and not fork for two reasons: 1) We don't want jobs running on the headnode. 2) Our accounting mechanism is integrated with PBS. Migrating to 4.2 is not an option at the moment as it's not backward-compatible with 4.0.x which our sister sites are using currently. Thanks and Regards, Prashanth Chengi National PARAM SuperComputing Facility System Administration and Networking Group C-DAC Pune. Courage is the resistance to fear, mastery of fear, Not the absence of fear -Mark Twain On Tue, 28 Apr 2009, Martin Feller wrote: Hi, This feature is not supported in the 4.0 series. It's not something that can be configured, but it required code changes. -Martin Prashanth Chengi wrote: Dear all, On our site, we are running gt4.0.8. We want to disable fork and set PBS as the default local resource manager. We were able to find documentation to do so in gt4.2 but not gt4.0. Any suggestions on how we can implement it on gt4.0? Thanks and Regards, Prashanth Chengi National PARAM SuperComputing Facility, System Administration and Networking Group, C-DAC Pune. Ext-183 Mob: 09766044870 Courage is the resistance to fear, mastery of fear, Not the absence of fear -Mark Twain -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- Martin Feller The Globus Alliance Computation Institute at University of Chicago Mathematics Computer Science Division at Argonne National Laboratory Phone: 630 252-4826
Re: [gt-user] choice of db
The GT as a whole does not work with a common DB system, it's the individual services (like RFT, GRAM) that might make use of DB system, and it can differ between the services. As far as I know in the 4.0 series MySQL and PostgreSQL are supported by those services using a DB system, in 4.2.x maybe additionally Derby. The online documentation of the individual services should tell you more precisely what is supported. -Martin jebin wrote: I just wanted to know if the globus toolkit works with mysql or do I have to use postgresql rgds jebin cherian
Re: [gt-user] Problem with RFT configuration: No suitable driver found error
The connectionString in the DB section is wrong in your jndi-config.xml Must not be $GLOBUS_LOCATION/var/rftDatabase, but should be jdbc:postgresql://host[:port]/rftDatabase Also check http://www.globus.org/toolkit/docs/latest-stable/data/rft/admin/#rft-postgresql -Martin Sergei Smolov wrote: Hello, List! I've installed Globus Toolkit 4.2.1 and PostgreSQL 7.3.2 for RFT testing. Then I execute the following commands: ./postmaster -D data directory address -o -i $GLOBUS_LOCATION/sbin/globus-gridftp-server -p 2811 $GLOBUS_LOCATION/bin/globus-start-container When I try to start container, I get the following error: Unable to connect to database.No suitable driver found for /home/ssedai/GlobusToolkit/var/rftDatabase. Caused by java.sql.SQLException: No suitable driver found for /home/ssedai/GlobusToolkit/var/rftDatabase at java.sql.DriverManager.getConnection(DriverManager.java:602) at java.sql.DriverManager.getConnection(DriverManager.java:185) at org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:48) at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:290) at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:771) at org.apache.commons.dbcp.PoolingDriver.connect(PoolingDriver.java:175) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:207) at org.globus.transfer.reliable.service.database.RFTDatabaseSetup.getDBConnection(RFTDatabaseSetup.java:261) at org.globus.transfer.reliable.service.database.ReliableFileTransferDbAdapter.setSchemaVersion(ReliableFileTransferDbAdapter.java:441) at org.globus.transfer.reliable.service.database.ReliableFileTransferDbAdapter.setup(ReliableFileTransferDbAdapter.java:155) at org.globus.transfer.reliable.service.ReliableFileTransferImpl.init(ReliableFileTransferImpl.java:78) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at java.lang.Class.newInstance0(Class.java:355) at java.lang.Class.newInstance(Class.java:308) at org.globus.axis.providers.RPCProvider.getNewServiceInstance(RPCProvider.java:120) at org.globus.axis.description.ServiceDescUtil.initializeProviders(ServiceDescUtil.java:214) at org.globus.axis.description.ServiceDescUtil.initializeService(ServiceDescUtil.java:163) at org.globus.wsrf.container.ServiceManager$InitPrivilegedAction.initialize(ServiceManager.java:384) at org.globus.wsrf.container.ServiceManager$InitPrivilegedAction.run(ServiceManager.java:396) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:60) at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:100) at org.globus.wsrf.container.ServiceManager.initializeService(ServiceManager.java:271) at org.globus.wsrf.container.ServiceManager.start(ServiceManager.java:177) at org.globus.wsrf.container.ServiceDispatcher.startServices(ServiceDispatcher.java:799) at org.globus.wsrf.container.ServiceDispatcher.init(ServiceDispatcher.java:435) at org.globus.wsrf.container.ServiceContainer.start(ServiceContainer.java:252) at org.globus.wsrf.container.ServiceContainer.init(ServiceContainer.java:212) at org.globus.wsrf.container.GSIServiceContainer.init(GSIServiceContainer.java:42) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.globus.wsrf.container.ServiceContainer.createContainer(ServiceContainer.java:168) at org.globus.wsrf.container.ServiceContainer.startSecurityContainer(ServiceContainer.java:606) at org.globus.wsrf.container.ServiceContainer.main(ServiceContainer.java:539) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.globus.bootstrap.BootstrapBase.launch(BootstrapBase.java:114) at org.globus.bootstrap.ContainerBootstrap.main(ContainerBootstrap.java:40) 2009-04-09T16:01:14.200+04:00 ERROR service.ReliableFileTransferImpl [main,oldLog:179] Unable to setup database driver with pooling.Unable to connect to database.No suitable driver found for
Re: [gt-user] Output file problem
Ritesh, In Gram4 you can use the fileStageOut element in the job description to get files. Check http://tinyurl.com/c8mpj3 Note: For that to work a GridFTP server must be running on the Gram4 machine (or on some other machine that has access to the data written by your job), and on your client machine (or where you want the data to be transferred to). If you want to get files to your client-machine and don't want to run a GridFTP server on your client, you have to get the data manually after the job, e.g. using a gridftp client like globus-url-copy. For this to work you still have to have a GridFTP server running on the Gram4 machine, (or on some other machine that has access to the data written by your job). But you can also fetch it by some other transfer mechanism (e.g. scp, ftp, floppy disk) -Martin Ritesh Badwaik wrote: Hi, If job submitted to globus produces some output file then how to retrieve that output file from GLOBUS_USER_HOME directory where job is executed. Thanks and regards Ritesh
Re: [gt-user] problem with globus+condor-g
Hm, I never saw this. The problem seems to be this: 3/30 20:00:55 [18840] GAHP[18841] (stderr) - faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException 3/30 20:00:55 [18840] GAHP[18841] (stderr) - faultSubcode: 3/30 20:00:55 [18840] GAHP[18841] (stderr) - faultString: org.globus.common.ChainedIOException: Authentication failed [Caused by: Miscellaneous failure. [Caused by: Bad certificate (java.security.SignatureException: SHA-1/RSA/PKCS#1: Not initialized)]] 3/30 20:00:55 [18840] GAHP[18841] (stderr) - faultActor: 3/30 20:00:55 [18840] GAHP[18841] (stderr) - faultNode: 3/30 20:00:55 [18840] GAHP[18841] (stderr) - faultDetail: 3/30 20:00:55 [18840] GAHP[18841] (stderr) - {http://xml.apache.org/axis/}stackTrace:Authentication failed. Caused by Miscellaneous failure. Caused by COM.claymoresystems.ptls.SSLThrewAlertException: Bad certificate (java.security.SignatureException: SHA-1/RSA/PKCS#1: Not initialized) Did you actually create a user proxy certificate before the condor submission? Does a job submission using globusrun-ws work from this client to the same server? -Martin induru hemanth wrote: Martin Sir, Thanks for your response I am attaching the following files GridmanagerLog.globus , conatiner.log, containeLog Thanking You Hemanth BITMESRA
Re: [gt-user] problem with globus+condor-g
You should get more detailed GridManager logging on the client-side by setting the parameter GRIDMANAGER_DEBUG = D_FULLDEBUG in your Condor configuration. Please do this and send the Gridmanager log again, and also send the server-side GT4 container logfile for more information. -Martin induru hemanth wrote: Hi, I am using globus-4.2.1 condor-7.2.0 I have a problem while submmitting jobs from condor-G to globus [glo...@g1 ~]$ vi xyy_cond Executable =/home/globus/xyy.sh universe = grid grid_resource = gt4 https://g1:8443/wsrf/services/ManagedJobFactoryService Condor output = xyy.out error = xyy.error Log = xyy.log Queue [glo...@g1 ~]$ condor_submit xyy_cond Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 105. [glo...@g1 ~]$ vi xyy.log 000 (105.000.000) 03/28 10:01:23 Job submitted from host: 172.16.40.200:51114 ... 012 (105.000.000) 03/28 10:01:40 Job was held. Failed to create proxy delegation Code 0 Subcode 0 [glo...@g1 ~]$ cd /home/condor/log [glo...@g1 log]$ vi GridmanagerLog.globus 3/28 10:01:23 ** 3/28 10:01:23 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 3/28 10:01:23 ** /home/condor/condor-7.2.0/sbin/condor_gridmanager 3/28 10:01:23 ** SubsystemInfo: name=GRIDMANAGER type=DAEMON(10) class=DAEMON(1) 3/28 10:01:23 ** Configuration: subsystem:GRIDMANAGER local:NONE class:DAEMON 3/28 10:01:23 ** $CondorVersion: 7.2.0 Dec 19 2008 BuildID: 121001 $ 3/28 10:01:23 ** $CondorPlatform: X86_64-LINUX_RHEL5 $ 3/28 10:01:23 ** PID = 17288 3/28 10:01:23 ** Log last touched 3/28 09:18:31 3/28 10:01:23 ** 3/28 10:01:23 Using config source: /home/condor/condor-7.2.0/etc/condor_config 3/28 10:01:23 Using local config sources: 3/28 10:01:23/home/condor/condor_config.local 3/28 10:01:23 DaemonCore: Command Socket at 172.16.40.200:35368 3/28 10:01:26 [17288] JEF: ConfigureGahp() 3/28 10:01:26 [17288] Found job 105.0 --- inserting 3/28 10:01:26 [17288] gahp server not up yet, delaying ping 3/28 10:01:26 [17288] gahp server not up yet, delaying checkDelegation 3/28 10:01:28 [17288] (105.0) doEvaluateState called: gmState GM_INIT, globusState 3/28 10:01:28 [17288] GAHP server pid = 17292 3/28 10:01:38 [17288] (105.0) doEvaluateState called: gmState GM_UNSUBMITTED, globusState 3/28 10:01:40 [17288] resource https://g1:8443/wsrf/services/ManagedJobFactoryService is now up 3/28 10:01:40 [17288] (105.0) doEvaluateState called: gmState GM_DELEGATE_PROXY, globusState 3/28 10:01:40 [17288] delegate_credentials(https://g1:8443/wsrf/services/DelegationFactoryService) failed! 3/28 10:01:40 [17288] (105.0) doEvaluateState called: gmState GM_DELEGATE_PROXY, globusState 3/28 10:01:43 [17288] No jobs left, shutting down 3/28 10:01:43 [17288] Got SIGTERM. Performing graceful shutdown. 3/28 10:01:43 [17288] condor_gridmanager (condor_GRIDMANAGER) pid 17288 EXITING WITH STATUS 0 ___ CAN ANY ONE HELP ME Thanking You Hemanth, BIT Mesra.
Re: [gt-user] Problem with Ganglia IP
I can't tell you what the issue is at the moment. Why do you prefer this version? It's very old, and had problems that have been solved in newer versions. -Martin cmasmas cmasmas wrote: I would prefer to use this version. Any temporary solution to the problem? 2009/3/27 Martin Feller fel...@mcs.anl.gov I'd highly recommend to pick a later version of the GT, if that's doable for you. 4.2.1 for the 4.2 series, or 4.0.8 for the 4.0 series. -Martin cmasmas cmasmas wrote: Hi, I'm trying to use Globus 4.0.1 with Ganglia IP. When I start the globus container I get the following error: 2009-03-27 16:37:54,202 ERROR usefulrp.GLUEResourceProperty [GLUE refresher 0,runScript:372] Could not deserialize output of producer org.globus.mds.usefulrp.glue.GangliaElementProducer to an instance of class org.globus.mds.glue.batchprovider.ClusterCollectionType I have read that the problem can be in the ganglia_to_glue.xslt. Can anyone help me with this? Thanks in advance.
Re: [gt-user] how to enable WS-GRAM-CONDOR
Inderpreet Chopra wrote: I have globus GT4.0installed on my machine without any configuration for scheduler. But now i want to use condor scheduler. In the quickstart guide, it is mentioned that use *--enable-wsgramcondor* while configuring. *Does there any way to enable this option without affecting my present installation?* No, it will effect your installation. You'd have to set up a second installation if you don't want to modify your current one *Also from where can i get the instructions for further configuring condor scheduler with globus?* There should not be any further configuration required. If submission to condor does not work, you can check $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager. It's here where the condor-specific job description is created. -Martin I have used the all-source installer for installing globus. Regards, Inderpreet
Re: [gt-user] No 'pem' file created after installing simpleCA !!
Hm, how did you install simpleCA? Did you follow http://www.globus.org/toolkit/docs/latest-stable/admin/quickstart/#q-security? The CA should show up in ~/.globus/simpleCA/ and not in /globus/simpleCA. Does it work if you follow the quickstart guide? -Martin Manisha Lakra wrote: Hello, I have successfully installed the globus toolkit 4.2.0 on first machine. After that I tried to install simpleCA on that machine, but no file with the extension of 'pem' is created in the folder '/globus/simpleCA'. Only the following files and directory are available: certs crl grid-ca-ssl.conf index.txt newcerts private serial Now, how could I proceed my installation. I tried to overwrite the existing simpleCA, by reinstalling it. Still the same thing happened, no file with ' pem' extension. Kindly, guide me how to proceed now. Thank you Regards, Manisha Lakra
Re: [gt-user] Error in -start-container
It's $GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml Check the userName and password parameters in section dbConfiguration -Martin Danilo Delizia wrote: Hi, I'm trying to install globus toolkit 4.0.8 on kubuntu 8.10 i followed the guide to install it and the simpleCA guide to configure the security system. When i try to start the container i got this error: globus-start-container 2009-03-14 23:30:31,592 ERROR service.ReliableFileTransferImpl [main,init:76] Unable to setup database driver with pooling.A connection error has occurred: FATAL: password authentication failed for user danilo 2009-03-14 23:30:32,321 WARN service.ReliableFileTransferHome [main,initialize:97] All RFT requests will fail and all GRAM jobs that require file staging will fail.A connection error has occurred: FATAL: password authentication failed for user danilo Starting SOAP server at: https://127.0.0.1:8443/wsrf/services/ With the following services: [1]: https://127.0.0.1:8443/wsrf/services/AdminService [2]: https://127.0.0.1:8443/wsrf/services/AuthzCalloutTestService [3]: https://127.0.0.1:8443/wsrf/services/CASService [4]: https://127.0.0.1:8443/wsrf/services/ContainerRegistryEntryService [5]: https://127.0.0.1:8443/wsrf/services/ContainerRegistryService [6]: https://127.0.0.1:8443/wsrf/services/CounterService [7]: https://127.0.0.1:8443/wsrf/services/DefaultIndexService [8]: https://127.0.0.1:8443/wsrf/services/DefaultIndexServiceEntry [9]: https://127.0.0.1:8443/wsrf/services/DefaultTriggerService [10]: https://127.0.0.1:8443/wsrf/services/DefaultTriggerServiceEntry [11]: https://127.0.0.1:8443/wsrf/services/DelegationFactoryService [12]: https://127.0.0.1:8443/wsrf/services/DelegationService [13]: https://127.0.0.1:8443/wsrf/services/DelegationTestService [14]: https://127.0.0.1:8443/wsrf/services/InMemoryServiceGroup [15]: https://127.0.0.1:8443/wsrf/services/InMemoryServiceGroupEntry [16]: https://127.0.0.1:8443/wsrf/services/InMemoryServiceGroupFactory [17]: https://127.0.0.1:8443/wsrf/services/IndexFactoryService [18]: https://127.0.0.1:8443/wsrf/services/IndexService [19]: https://127.0.0.1:8443/wsrf/services/IndexServiceEntry [20]: https://127.0.0.1:8443/wsrf/services/JWSCoreVersion [21]: https://127.0.0.1:8443/wsrf/services/ManagedExecutableJobService [22]: https://127.0.0.1:8443/wsrf/services/ManagedJobFactoryService [23]: https://127.0.0.1:8443/wsrf/services/ManagedMultiJobService [24]: https://127.0.0.1:8443/wsrf/services/ManagementService [25]: https://127.0.0.1:8443/wsrf/services/NotificationConsumerFactoryService [26]: https://127.0.0.1:8443/wsrf/services/NotificationConsumerService [27]: https://127.0.0.1:8443/wsrf/services/NotificationTestService [28]: https://127.0.0.1:8443/wsrf/services/PersistenceTestSubscriptionManager [29]: https://127.0.0.1:8443/wsrf/services/ReliableFileTransferFactoryService [30]: https://127.0.0.1:8443/wsrf/services/ReliableFileTransferService [31]: https://127.0.0.1:8443/wsrf/services/RendezvousFactoryService [32]: https://127.0.0.1:8443/wsrf/services/ReplicationService [33]: https://127.0.0.1:8443/wsrf/services/SampleAuthzService [34]: https://127.0.0.1:8443/wsrf/services/SecureCounterService [35]: https://127.0.0.1:8443/wsrf/services/SecurityTestService [36]: https://127.0.0.1:8443/wsrf/services/ShutdownService [37]: https://127.0.0.1:8443/wsrf/services/SubscriptionManagerService [38]: https://127.0.0.1:8443/wsrf/services/TestAuthzService [39]: https://127.0.0.1:8443/wsrf/services/TestRPCService [40]: https://127.0.0.1:8443/wsrf/services/TestService [41]: https://127.0.0.1:8443/wsrf/services/TestServiceRequest [42]: https://127.0.0.1:8443/wsrf/services/TestServiceWrongWSDL [43]: https://127.0.0.1:8443/wsrf/services/TriggerFactoryService [44]: https://127.0.0.1:8443/wsrf/services/TriggerService [45]: https://127.0.0.1:8443/wsrf/services/TriggerServiceEntry [46]: https://127.0.0.1:8443/wsrf/services/Version [47]: https://127.0.0.1:8443/wsrf/services/WidgetNotificationService [48]: https://127.0.0.1:8443/wsrf/services/WidgetService [49]: https://127.0.0.1:8443/wsrf/services/gsi/AuthenticationService [50]: https://127.0.0.1:8443/wsrf/services/mds/test/execsource/IndexService [51]: https://127.0.0.1:8443/wsrf/services/mds/test/execsource/IndexServiceEntry [52]:
Re: [gt-user] Help: fileStageIn owner problem
Does the same happen if you use globus-url-copy to transfer a file, instead of using GridFTP via ws-gram? (globus-url-copy \ gsiftp://client.mydomain.com:2811/home/griduser1/grid/myhello \ gsiftp://cm.mydomain.com:2811/tmp/myhello) I assume so, and this would help narrowing it down. -Martin Le Trung Kien wrote: Hi, In my job description, I define a fileStageIn like this fileStageIn transfer sourceUrlgsiftp:// client.mydomain.com:2811/home/griduser1/grid/myhello/sourceUrl destinationUrlgsiftp://cm.mydomain.com:2811/tmp/myhello /destinationUrl /transfer /fileStageIn After submitting my job, I got the file delivered, but it's strange that on cm.mydomain.com gridus...@cm #] ls -l /tmp/myhello -rwxr-xr-x1 root root 147 Mar 9 16:02 /tmp/myhello We see that this file is owned by root. In fact, with this problem I couldn't copy files and execute the files with right permission on my user's directories. Additional information : In my grid-mapfile, I have only one mapping from grid user to local user (this local user in my case is a NIS account). Help me, please.
Re: [gt-user] Failed to initialize GAHP
AFAIK Gahp initialization pure Condor, so I think this question is for the Condor group. -Martin Samir Khanal wrote: Hi All I don;t know where to ask this question (Condor or Globus) I had setup a Globus_condor_g grid A had the Gatekeeper and B had to submit jobs to A Everything was going smoothly and i could submit PBS/CONDOR both type of jobs Then i was asked to reverse the situation B had to be the gatekeeper (as it had larger resources) and A had now to submit jobs to B's resources. I used the GT4 quickstart guide and every setup went well, except now when i submit Grid jobs via Condor-G The jobs get held. executable = /bin/date Transfer_Executable = false globusscheduler = B.xx.xx.xx/jobmanager-fork universe = grid output = date.out error=date.error log = date.log queue The same script worked the other way around The myproxy login and all other stuffs work, besides this problem. An i looked into the submit.log it says 012 (086.000.000) 03/05 18:28:53 Job was held. Failed to initialize GAHP Code 0 Subcode 0 ... I then tried [~]$ /opt/condor/sbin/gt4_gahp $GahpVersion: 1.7.1 Apr 23 2008 GT4\ GAHP\ (GT-4.0.4) $ and it does start (JAVA Is set up correctly) What seems to be the problem? i am a bit stuck with this. I am using Rocks 5.1, GT 4.2.1. Condor Roll that came with Rocks 5.1 Thanks Samir
Re: [gt-user] globusrun-ws: Job failed: Staging error for RSL element fileStageIn
I'm a bit confused about this error. It seems that RFT does not find the delegated credential delegated by globusrun-ws. Does the GT container logfile give more information? Does the same happen if you do job delegation? (globusrun-ws -submit -J -c /bin/date) Does a job with streaming give the same error? globusrun-ws -submit -s -c /bin/date -Martin Ritesh Badwaik wrote: I am using gt4.2.1 Martin Feller wrote: Hi, What GT version is that? Martin Ritesh Badwaik wrote: hi, After giving the command globusrun-ws -submit -S -f a.rsl I am getting following error __ Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:5ffcc398-0878-11de-b882-0004796723fc Termination time: 03/04/3009 04:53 GMT Current job state: StageIn Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Staging error for RSL element fileStageIn. Unable to create RFT resource; nested exception is: org.globus.transfer.reliable.service.exception.RftException: Error processing delegated credentialError getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException] [Caused by: Error getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException]] globusrun-ws: Job failed: Staging error for RSL element fileCleanUp. Unable to create RFT resource; nested exception is: org.globus.transfer.reliable.service.exception.RftException: Error processing delegated credentialError getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException] [Caused by: Error getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException]] __ My rsl file is as follows -- job executablemy_echo/executable directory${GLOBUS_USER_HOME}/directory stdout${GLOBUS_USER_HOME}/stdout/stdout stderr${GLOBUS_USER_HOME}/stderr/stderr fileStageIn transfer sourceUrlgsiftp://vsundar-fc8.corp.cdac.in:2811/home/ritesh/s/sourceUrl destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl /transfer /fileStageIn fileCleanUp deletion filefile:///${GLOBUS_USER_HOME}/my_echo/file /deletion /fileCleanUp /job _ I have attached the container.log file. vsundar-fc8.corp.cdac.in is the same machine on which I am submitting rsl file. Can anyone give me the solution for this error ? Thanks and Regards Ritesh
Re: [gt-user] gt-4.2.1
Hm, something must go wrong with $PATH i think: Can you actually call the condor commandline-tools as user globus? What's the output of which condor_submit and echo $PATH? What's the output of $GLOBUS_LOCATION/setup/globus/find-condor-tools Martin induru hemanth wrote: Martin Sir, Thanks for ur response ./configure --enable-wsgram-condor make above commands are working fine but while running make install its showing the error [glo...@g3 gt4.2.1-x86_64_rhas_4-installer]$ make install ln -sf /usr/local/globus-4.2.1.1/etc/gpt/packages /usr/local/globus-4.2.1.1/etc/globus_packages /usr/local/globus-4.2.1.1/sbin/gpt-postinstall running /usr/local/globus-4.2.1.1/setup/globus/setup-globus-job-manager-condor.pl..[ Changing to /usr/local/globus-4.2.1.1/setup/globus ] find-condor-tools: error: Cannot locate condor_submit checking for condor_submit... no Error locating condor commands, aborting! ERROR: Command failed make: *** [postinstall] Error 9 [glo...@g3 gt4.2.1-x86_64_rhas_4-installer]$ __ [glo...@g3 gt4.2.1-x86_64_rhas_4-installer]$ export PATH=/home/condor/condor-7.2.0/sbin:/home/condor/condor-7.2.0/bin:$PATH after setting PATH also its showing the same message __ condor commands are working properly while running as root , but condor_submit working with condor user only _ Please help me, Hemanth, BIT MESRA On 3/2/09, Martin Feller fel...@mcs.anl.gov wrote: If you didn't already build support for condor in ws-gram you have to do so: Go into the GT installer directory (source or binary installer) and do ./configure --enable-wsgram-condor(and whatever other options you provided) make make install After a GT server restart you should be able to submit jobs to Condor like globusrun-ws -submit -Ft Condor -c /bin/date If the job does not get through or keeps staying in state IDLE in Condor come back to the list. -Martin induru hemanth wrote: Hi I am using GT-4.2.1 (Defualt Resource manager -Fork) I Just installed and configured condor-7.2.0 How i can acess Condor pool through Globus 4.2.1 Thanking you, Hemanth, B.I.T Mesra.
Re: [gt-user] globusrun-ws: Job failed: Staging error for RSL element fileStageIn
Hi, What GT version is that? Martin Ritesh Badwaik wrote: hi, After giving the command globusrun-ws -submit -S -f a.rsl I am getting following error __ Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:5ffcc398-0878-11de-b882-0004796723fc Termination time: 03/04/3009 04:53 GMT Current job state: StageIn Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Staging error for RSL element fileStageIn. Unable to create RFT resource; nested exception is: org.globus.transfer.reliable.service.exception.RftException: Error processing delegated credentialError getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException] [Caused by: Error getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException]] globusrun-ws: Job failed: Staging error for RSL element fileCleanUp. Unable to create RFT resource; nested exception is: org.globus.transfer.reliable.service.exception.RftException: Error processing delegated credentialError getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException] [Caused by: Error getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException]] __ My rsl file is as follows -- job executablemy_echo/executable directory${GLOBUS_USER_HOME}/directory stdout${GLOBUS_USER_HOME}/stdout/stdout stderr${GLOBUS_USER_HOME}/stderr/stderr fileStageIn transfer sourceUrlgsiftp://vsundar-fc8.corp.cdac.in:2811/home/ritesh/s/sourceUrl destinationUrlfile:///${GLOBUS_USER_HOME}/my_echo/destinationUrl /transfer /fileStageIn fileCleanUp deletion filefile:///${GLOBUS_USER_HOME}/my_echo/file /deletion /fileCleanUp /job _ I have attached the container.log file. vsundar-fc8.corp.cdac.in is the same machine on which I am submitting rsl file. Can anyone give me the solution for this error ? Thanks and Regards Ritesh
Re: [gt-user] gt-4.2.1
If you didn't already build support for condor in ws-gram you have to do so: Go into the GT installer directory (source or binary installer) and do ./configure --enable-wsgram-condor(and whatever other options you provided) make make install After a GT server restart you should be able to submit jobs to Condor like globusrun-ws -submit -Ft Condor -c /bin/date If the job does not get through or keeps staying in state IDLE in Condor come back to the list. -Martin induru hemanth wrote: Hi I am using GT-4.2.1 (Defualt Resource manager -Fork) I Just installed and configured condor-7.2.0 How i can acess Condor pool through Globus 4.2.1 Thanking you, Hemanth, B.I.T Mesra.
Re: [gt-user] problem in transferring file
In a job descriptions you can use 'file:///...' only for the GridFTP server associated (or local) to the ws-gram server. file:/// will always be interpreted as local to the ws-gram server. That means: For a fileStageIn element you can use it only in the destinationUrl element and for a fileStageOut element you can use it only in the sourceUrl element. In sourceUrl of a fileStageIn element and in destinationUrl of fileStageOut you must provide 'gridftp urls'. ws-gram will substitute 'file://' by 'gsiftp://gridftp-server:port', according to the gram-gridftp file system mappings defined by the admin. Check http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/gram4/admin/#gram4-nondefaultgridftp and http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/gram4/admin/#gram4-Interface_Config_Frag-filesysmap for more detailed information about the gram-gridftp mappings. If you want to transfer files as part of a job between 2 gridftp servers that are completely unrelated to the ws-gram server, you can do so. But you have to specify gridftp urls then. Martin Ufuk Utku Turuncoglu wrote: Hi, I change the order and i got following error, at org.globus.exec.service.exec.RunThread.run(RunThread.java:85) Can't do MLST on non-existing file/dir /Users/xyz/Desktop/dummy01.dat on server fr0103ge.ncar.teragrid.org. Caused by org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500- 500 End.]. Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Unexpected reply: 500-Command failed : System error in stat: No such file or directory 500-A system call failed: No such file or directory 500- 500 End. I don't understand. The file /Users/xyz/Desktop/dummy01.dat is in my local machine. Why it try to find in remote server. I just want to copy local file to remote server. I also try to copy file using globus-url-copy globus-url-copy file:///Users/xyz/Desktop/dummy01.dat gsiftp://gridftp.frost.ncar.teragrid.org/ptmp/xyz/dummy01.dat and it works. The newer RSL file as, ... fileStageOut transfer sourceUrlfile:///Users/xyz/Desktop/dummy01.dat/sourceUrl destinationUrlgsiftp://gridftp.frost.ncar.teragrid.org/ptmp/xyz/dummy01.dat/destinationUrl /transfer /fileStageOut ... Thanks, --ufuk Stuart Martin wrote: It looks to me like you may have the source and dest mixed up. For stage out, the source would typically be the file:/// url which would get replaced by ws gram with the service-side gridftp server host and port. Then the gsiftp url would be to the gridftp server running on the client-side. fileStageOut transfer sourceUrlgsiftp://gridftp.frost.ncar.teragrid.org/ptmp/xyz/dummy01.dat/sourceUrl destinationUrlfile:///Users/xyz/Desktop/dummy01.dat/destinationUrl /transfer /fileStageOut -Stu On Feb 27, 2009, at Feb 27, 10:58 AM, Ufuk Utku Turuncoglu wrote: Hi, I try to submit a globus job with file transfer but i got following error. The local file appears as null in the log. Submission ID: uuid:af0a91f0-0494-11de-bdbe-fda4d1871b6e delegation level: gsilimited delegation level: gsifull WAITING FOR JOB TO FINISH: == State Notification == State : Failed Holding: false Exit Code: 0 Failed Failed Fault: fault type: org.globus.exec.generated.StagingFaultType: attribute: fileStageOut description: Staging error for RSL element fileStageOut, from gsiftp://gridftp.frost.ncar.teragrid.org:2811/ptmp/xyz/dummy01.dat to null. destination: null faultReason: faultString: gt2ErrorCode: 0 originator: Address: https://fr0103ge.ncar.teragrid.org:8443/wsrf/services/ManagedJobFactoryService Reference property[0]: ns6:ResourceID xmlns:ns6=http://www.globus.org/namespaces/2004/10/gram/job;cd2d9100-0494-11de-97de-d94967efe41a/ns6:ResourceID source: gsiftp://gridftp.frost.ncar.teragrid.org:2811/ptmp/xyz/dummy01.dat stackTrace: org.globus.exec.generated.StagingFaultType: Staging error for RSL element fileStageOut, from gsiftp://gridftp.frost.ncar.teragrid.org:2811/ptmp/xyz/dummy01.dat to null. Timestamp: Thu Feb 26 23:06:41 MST 2009 Originator: Address: https://fr0103ge.ncar.teragrid.org:8443/wsrf/services/ManagedJobFactoryService Reference property[0]: ns6:ResourceID xmlns:ns6=http://www.globus.org/namespaces/2004/10/gram/job;cd2d9100-0494-11de-97de-d94967efe41a/ns6:ResourceID I also check delegation level and it seems as full. I am using gt4.0.8 java libraries. The jobs work correctly without data transfer part. Any suggestion will be helpful, --ufuk RSL Script --- ?xml version=1.0 encoding=UTF-8? job
Re: [gt-user] Possible container bug
I think i see the problem: http://bugzilla.globus.org/globus/show_bug.cgi?id=6350 I opened that bug a while ago but didn't get to it yet. Looks like it's time for that now. If i prepare a fix: Can you try it out? Martin Kay Dörnemann wrote: Hi, you will find attached the full container.log from today. The problem occurred fast today. I guess it was between 3 pm and 11 pm (CET). Thanks. Cheers, Kay Martin Feller wrote, on 21.02.2009 20:52: What about all this: http://lists.globus.org/pipermail/gt-user/2009-February/007772.html ? Kay Dörnemann wrote: Hi, we tried dumping the RFT database but within one day suddenly the CPU usage of the globus process jumped again to 100%. As usual. Anyone an idea? Thank you. Cheers, Kay Patrick Armstrong schrieb: I realize it has been almost a month since your post in reply to gt-user, but are you still having the problem described? Martin Feller, (who fixed this bug) suggested some things in gt-user on the 6th, specifically deleting your persisted directory. I've also found that dumping your rft database also helps. --patrick
Re: [gt-user] is running a container for both globus 4.2 and 4.0 possible?
You can run more than one container on one machine - i do it all the time. AFAIK the installations just have to be located in different directories. Say, you have two gt installs: /opt/gt408 and /opt/gt421. I personally then have ~/.bashrc408 and a ~/.bashrc421, setting up paths, GLOBUS_LOCATION (and maybe CLASSPATH) for the different gt installs. Corresponding to each bashrc file i have an alias which sources the appropriate bashrc file: alias 408='cp ~/.bashrc408 ~/.bashrc source ~/.bashrc' alias 421='cp ~/.bashrc421 ~/.bashrc source ~/.bashrc' Switching context you can easily start different containers, they have to listen on different ports though. Not sure if this is the smartest way, but it works for me. -Martin Cole Uhlman wrote: Hello, all. I would like to be able to set up machines to be able to accept jobs from either globus 4.2 or 4.0. Globus doesn't want me running two containers (if i try to run the second, ERROR: A container with pid 2177 is already running) On a machine with both installed, would it even be theoretically possible to run two containers? Could there be another way for one machine to serve both 4.2 and 4.0? Thanks. -Cole
Re: [gt-user] is running a container for both globus 4.2 and 4.0 possible?
Alexander Beck-Ratzka wrote: On Wednesday, February 25th 2009 07:04:42 Martin Feller wrote: You can run more than one container on one machine - i do it all the time. AFAIK the installations just have to be located in different directories. Say, you have two gt installs: /opt/gt408 and /opt/gt421. I personally then have ~/.bashrc408 and a ~/.bashrc421, setting up paths, GLOBUS_LOCATION (and maybe CLASSPATH) for the different gt installs. Corresponding to each bashrc file i have an alias which sources the appropriate bashrc file: alias 408='cp ~/.bashrc408 ~/.bashrc source ~/.bashrc' alias 421='cp ~/.bashrc421 ~/.bashrc source ~/.bashrc' Switching context you can easily start different containers, they have to listen on different ports though. Not sure if this is the smartest way, but it works for me. I am not sure if this will work by putting globus 4.0 and 4.2 in different directories. The wsgram service is creating a listening port, namely 8443. If this is really a listening port, the second wsgram service won't come up, because it will try to open the same listening port. This will lead to a Unix / Linux system error. Therefore I think you also need to change those ports in the configuration files for the second container. Cheers Alexander I think that's what i wanted to say by they have to listen on different ports though Or do you mean something else here? Martin