[ https://issues.apache.org/jira/browse/AIRAVATA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101759#comment-16101759 ]
Marcus Christie commented on AIRAVATA-2492: ------------------------------------------- Example of a fast file transfer in seagrid: 6BG archive.tar file transferred in about 7 minutes. Experiment ID: Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732 gfac logs on scigap03: {noformat} 2017-07-26 09:13:29,653 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key :ccguser_stamp ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:13:29,653 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - Reuse SSH session for :ccguser_stampede.tacc.xsede. org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:13:29,656 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - SSH Session validation succeeded, key :pga_gf4.ucs. indiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:13:29,676 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key :pga_gf4.ucs.i ndiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:13:29,676 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - Reuse SSH session for :pga_gf4.ucs.indiana.edu_22_3 d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:13:29,676 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - SSH Session validation succeeded, key :ccguser_stam pede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:13:30,086 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key :ccguser_stamp ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:13:30,086 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - Reuse SSH session for :ccguser_stampede.tacc.xsede. org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:13:30,122 [pool-11-thread-145] INFO o.a.a.g.impl.HPCRemoteCluster - Executing command cd /scratch/00421/ccguser/seagrid- workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e && tar -cvf /scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27 f-0e8c00c8535e/archive.tar ./* 2017-07-26 09:13:30,680 [Thread-5] INFO o.a.a.g.m.e.EmailBasedMonitor - [EJM]: 161 new email/s received 2017-07-26 09:14:06,669 [pool-11-thread-145] INFO o.a.a.g.impl.HPCRemoteCluster - Transferring from:/scratch/00421/ccguser/seagrid-wor kdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To: /var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMB ER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar 2017-07-26 09:14:06,741 [pool-11-thread-145] INFO o.a.a.gfac.impl.SSHUtils - Initiating transfer from:/scratch/00421/ccguser/seagrid-w orkdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To: /var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_A MBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar, Ignore Empty file : true 2017-07-26 09:14:07,064 [pool-11-thread-145] INFO o.a.a.gfac.impl.SSHUtils - Transferring from:/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To: /var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar, File Size : 6057000960, Ignore Empty file : true ... 2017-07-26 09:21:05,861 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - SSH Session validation succeeded, key :ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:21:06,843 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key :ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:21:06,843 [pool-11-thread-145] INFO o.a.airavata.gfac.impl.Factory - Reuse SSH session for :ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b 2017-07-26 09:21:06,879 [pool-11-thread-145] INFO o.a.a.g.impl.HPCRemoteCluster - Executing command rm /scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar 2017-07-26 09:21:07,818 [pool-11-thread-145] INFO o.a.a.g.impl.task.ArchiveTask - Executing command cd /var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115 && mkdir ARCHIVE && tar -xvf archive.tar -C ARCHIVE && rm archive.tar && chmod 755 -R ARCHIVE/* 2017-07-26 09:23:33,645 [pool-11-thread-145] INFO o.a.a.g.c.context.TaskContext - expId: Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732, processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e, taskId: TASK_ca02a6a5-4460-4bc4-ba73-4b0000a1be7e, type: DATA_STAGING : Task status changed EXECUTING -> COMPLETED 2017-07-26 09:23:33,656 [pool-11-thread-145] INFO o.a.a.g.c.c.ProcessContext - expId: Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732, processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Process status changed OUTPUT_DATA_STAGING -> COMPLETED 2017-07-26 09:23:33,667 [pool-11-thread-145] INFO o.a.a.gfac.impl.GFacWorker - expId: Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732, processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Sent ack for deliveryTag 168 {noformat} File archive.tar with size 6057000960 transfered from stampede to gf4.ucs.indiana.edu in just under 7 minutes ({{2017-07-26 09:14:07,064}} - {{2017-07-26 09:21:05,861}}). > File transfer slow from alamo.uthscsa.edu to uslims3.uthscsa.edu > ---------------------------------------------------------------- > > Key: AIRAVATA-2492 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2492 > Project: Airavata > Issue Type: Bug > Reporter: Marcus Christie > Assignee: Marcus Christie > > Experiment ids in ultrascan.scigap.org: > US3-AIRA_f200b78e-5d86-499c-b2e6-c3b6fbb3bfa3 > US3-AIRA_b69482ba-b34f-4809-9274-bcda71b8dccc > Hipchat discussion: > {quote} > [11:27 PM] Gary Gorbet: On Alamo, 20 minutes are more elapse from end of job > until Airavata reports a job status of COMPLETE. Can't *something* be done to > speed that up! > ---- Wednesday July 26, 2017 ---- > [9:31 AM] Eroma Abeysinghe: @gary sorry for this. Currently all experiments > are completed, and two running in alamo. but we found two which took rather > unusually long time to transfers the files. we are investigating. will keep > you posted. > [9:51 AM] Marcus Christie: @gary here's what we've been able to confirm: gfac > is getting the completed email from the scheduler and processing it > immediately. Airavata doesn't mark an experiment as being completed until the > output data staging completes. The output data staging for the jobs you gave > us to look at took a long time to complete. > [9:52 AM] Marlon Pierce: You can test network speed in various ways (like > https://askubuntu.com/questions/7976/how-do-you-test-the-network-speed-betwen-two-boxes, > which I googled). > [9:53 AM] Marlon Pierce: It won't fix the problem, but it may help network > admins diagnose the problem > [9:53 AM] Marlon Pierce: I'm assuming (maybe incorrectly) that this is an > issue between Alamo and our servers > [9:54 AM] Marcus Christie: This is partly because transfer times from alamo > to lims are happening very slowly, by my rough calculations at about 200kb/s. > But also the analysis-results.tar file is much larger in these experiments. > One example we looked at it was about 200 MB and another was about 1.13 GB. > Looking through the logs back to April this is historically very large for an > analysis-results.tar file. > [9:54 AM] Marlon Pierce: But we can confirm this by looking at other servers > [9:54 AM] Marcus Christie: @marlon this is between alamo and lims. Do we have > a login on lims? > [9:56 AM] Marlon Pierce: Oh...where is the LIMS server? At UTHSCSA? > [9:57 AM] Marlon Pierce: I think it is. > [9:58 AM] Marcus Christie: @marlon uslims3.uthscsa.edu > [9:59 AM] Gary Gorbet: That other job completed. But now there is new, > critical, job that FINISHED 40 minutes ago. Airavata job status is EXECUTING. > [9:59 AM] Marcus Christie: It would be nice if we could run a command line > scp transfer test from alamo to uslims3 just to see if it is fundamentally > limited or this is a slowness in GFac. > [10:00 AM] Gary Gorbet: That is easy to do. Will do a test and report. > [10:00 AM] Marcus Christie: Thanks @gary > [10:01 AM] Gary Gorbet: Transferred a 1.6M file from uslims3 to alamo in > under a second. > [10:02 AM] Marcus Christie: Actually I realized that GFac will do the > transfer from alamo to uslims3 via GFac. It's not a true third party transfer > but rather the data is streamed through the GFac server. So @marlon was > right we need to test between alamo -> Gfac and GFac -> uslims3. > [10:03 AM] Gary Gorbet: This would be alamo/uslims3 to gw153; right? > [10:03 AM] Marcus Christie: Thanks @gary. So that's our theoretical upper > bound. > [10:04 AM] Marcus Christie: @gary that's right > [10:08 AM] Sudhakar Pamidighantam: I think If it is reasonably secure GFac > should do a true third party transfer to avoid this delay. > [10:08 AM] Gary Gorbet: The LIMS work directory for the job has all the > output files. So, it seems to me that gw153 to uslims3 transfers completed. > So, why the EXECUTING gfac status? > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)