[ 
https://issues.apache.org/jira/browse/AIRAVATA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101759#comment-16101759
 ] 

Marcus Christie edited comment on AIRAVATA-2492 at 7/26/17 2:54 PM:
--------------------------------------------------------------------

Example of a fast file transfer in seagrid:  6BG archive.tar file transferred 
in about 7 minutes.

Experiment ID: 
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732

gfac logs on scigap03:
{noformat}
2017-07-26 09:13:29,653 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Channel creation test succeeded, key 
:ccguser_stamp
ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,653 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Reuse SSH session for 
:ccguser_stampede.tacc.xsede.
org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,656 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - SSH Session validation succeeded, key 
:pga_gf4.ucs.
indiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Channel creation test succeeded, key 
:pga_gf4.ucs.i
ndiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Reuse SSH session for 
:pga_gf4.ucs.indiana.edu_22_3
d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - SSH Session validation succeeded, key 
:ccguser_stam
pede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,086 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Channel creation test succeeded, key 
:ccguser_stamp
ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,086 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Reuse SSH session for 
:ccguser_stampede.tacc.xsede.
org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,122 [pool-11-thread-145] INFO  
o.a.a.g.impl.HPCRemoteCluster  - Executing command cd 
/scratch/00421/ccguser/seagrid-
workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e && tar -cvf 
/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27
f-0e8c00c8535e/archive.tar ./* 
2017-07-26 09:13:30,680 [Thread-5] INFO  o.a.a.g.m.e.EmailBasedMonitor  - 
[EJM]: 161 new email/s received
2017-07-26 09:14:06,669 [pool-11-thread-145] INFO  
o.a.a.g.impl.HPCRemoteCluster  - Transferring 
from:/scratch/00421/ccguser/seagrid-wor
kdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To: 
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMB
ER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar
2017-07-26 09:14:06,741 [pool-11-thread-145] INFO  o.a.a.gfac.impl.SSHUtils  - 
Initiating transfer from:/scratch/00421/ccguser/seagrid-w
orkdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To: 
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_A
MBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar, Ignore Empty 
file : true
2017-07-26 09:14:07,064 [pool-11-thread-145] INFO  o.a.a.gfac.impl.SSHUtils  - 
Transferring 
from:/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar
 To: 
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar,
 File Size : 6057000960, Ignore Empty file : true
...
2017-07-26 09:21:05,861 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - SSH Session validation succeeded, key 
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,843 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Channel creation test succeeded, key 
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,843 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Reuse SSH session for 
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,879 [pool-11-thread-145] INFO  
o.a.a.g.impl.HPCRemoteCluster  - Executing command rm 
/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar
2017-07-26 09:21:07,818 [pool-11-thread-145] INFO  
o.a.a.g.impl.task.ArchiveTask  - Executing command cd 
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115
 && mkdir ARCHIVE && tar -xvf archive.tar -C ARCHIVE && rm archive.tar && chmod 
755 -R ARCHIVE/*
2017-07-26 09:23:33,645 [pool-11-thread-145] INFO  
o.a.a.g.c.context.TaskContext  - expId: 
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732, 
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e, taskId: 
TASK_ca02a6a5-4460-4bc4-ba73-4b0000a1be7e, type: DATA_STAGING : Task status 
changed EXECUTING -> COMPLETED
2017-07-26 09:23:33,656 [pool-11-thread-145] INFO  o.a.a.g.c.c.ProcessContext  
- expId: 
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732, 
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Process status 
changed OUTPUT_DATA_STAGING -> COMPLETED
2017-07-26 09:23:33,667 [pool-11-thread-145] INFO  o.a.a.gfac.impl.GFacWorker  
- expId: 
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732, 
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Sent ack for 
deliveryTag 168
{noformat}


File archive.tar with size 6057000960 transfered from stampede to 
gf4.ucs.indiana.edu in just under 7 minutes ({{2017-07-26 09:14:07,064}} - 
{{2017-07-26 09:21:05,861}}).

This works out to about 14 MB/sec.  So GFac can transfer at least this fast.  
That puts the transfer from alamo to uslims3 at about 2 orders of magnitude 
slower.

This points to there being a network issue between alamo->gw153->uslims3 as the 
underlying problem.



was (Author: marcuschristie):
Example of a fast file transfer in seagrid:  6BG archive.tar file transferred 
in about 7 minutes.

Experiment ID: 
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732

gfac logs on scigap03:
{noformat}
2017-07-26 09:13:29,653 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Channel creation test succeeded, key 
:ccguser_stamp
ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,653 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Reuse SSH session for 
:ccguser_stampede.tacc.xsede.
org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,656 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - SSH Session validation succeeded, key 
:pga_gf4.ucs.
indiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Channel creation test succeeded, key 
:pga_gf4.ucs.i
ndiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Reuse SSH session for 
:pga_gf4.ucs.indiana.edu_22_3
d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - SSH Session validation succeeded, key 
:ccguser_stam
pede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,086 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Channel creation test succeeded, key 
:ccguser_stamp
ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,086 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Reuse SSH session for 
:ccguser_stampede.tacc.xsede.
org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,122 [pool-11-thread-145] INFO  
o.a.a.g.impl.HPCRemoteCluster  - Executing command cd 
/scratch/00421/ccguser/seagrid-
workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e && tar -cvf 
/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27
f-0e8c00c8535e/archive.tar ./* 
2017-07-26 09:13:30,680 [Thread-5] INFO  o.a.a.g.m.e.EmailBasedMonitor  - 
[EJM]: 161 new email/s received
2017-07-26 09:14:06,669 [pool-11-thread-145] INFO  
o.a.a.g.impl.HPCRemoteCluster  - Transferring 
from:/scratch/00421/ccguser/seagrid-wor
kdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To: 
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMB
ER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar
2017-07-26 09:14:06,741 [pool-11-thread-145] INFO  o.a.a.gfac.impl.SSHUtils  - 
Initiating transfer from:/scratch/00421/ccguser/seagrid-w
orkdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To: 
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_A
MBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar, Ignore Empty 
file : true
2017-07-26 09:14:07,064 [pool-11-thread-145] INFO  o.a.a.gfac.impl.SSHUtils  - 
Transferring 
from:/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar
 To: 
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar,
 File Size : 6057000960, Ignore Empty file : true
...
2017-07-26 09:21:05,861 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - SSH Session validation succeeded, key 
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,843 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Channel creation test succeeded, key 
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,843 [pool-11-thread-145] INFO  
o.a.airavata.gfac.impl.Factory  - Reuse SSH session for 
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,879 [pool-11-thread-145] INFO  
o.a.a.g.impl.HPCRemoteCluster  - Executing command rm 
/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar
2017-07-26 09:21:07,818 [pool-11-thread-145] INFO  
o.a.a.g.impl.task.ArchiveTask  - Executing command cd 
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115
 && mkdir ARCHIVE && tar -xvf archive.tar -C ARCHIVE && rm archive.tar && chmod 
755 -R ARCHIVE/*
2017-07-26 09:23:33,645 [pool-11-thread-145] INFO  
o.a.a.g.c.context.TaskContext  - expId: 
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732, 
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e, taskId: 
TASK_ca02a6a5-4460-4bc4-ba73-4b0000a1be7e, type: DATA_STAGING : Task status 
changed EXECUTING -> COMPLETED
2017-07-26 09:23:33,656 [pool-11-thread-145] INFO  o.a.a.g.c.c.ProcessContext  
- expId: 
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732, 
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Process status 
changed OUTPUT_DATA_STAGING -> COMPLETED
2017-07-26 09:23:33,667 [pool-11-thread-145] INFO  o.a.a.gfac.impl.GFacWorker  
- expId: 
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732, 
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Sent ack for 
deliveryTag 168
{noformat}


File archive.tar with size 6057000960 transfered from stampede to 
gf4.ucs.indiana.edu in just under 7 minutes ({{2017-07-26 09:14:07,064}} - 
{{2017-07-26 09:21:05,861}}).

> File transfer slow from alamo.uthscsa.edu to uslims3.uthscsa.edu
> ----------------------------------------------------------------
>
>                 Key: AIRAVATA-2492
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2492
>             Project: Airavata
>          Issue Type: Bug
>            Reporter: Marcus Christie
>            Assignee: Marcus Christie
>
> Experiment ids in ultrascan.scigap.org:
> US3-AIRA_f200b78e-5d86-499c-b2e6-c3b6fbb3bfa3
> US3-AIRA_b69482ba-b34f-4809-9274-bcda71b8dccc
> Hipchat discussion:
> {quote}
> [11:27 PM] Gary Gorbet: On Alamo, 20 minutes are more elapse from end of job 
> until Airavata reports a job status of COMPLETE. Can't *something* be done to 
> speed that up!
> ---- Wednesday July 26, 2017 ----
> [9:31 AM] Eroma Abeysinghe: @gary sorry for this. Currently all experiments 
> are completed, and two running in alamo. but we found two which took rather 
> unusually long time to transfers the files. we are investigating. will keep 
> you posted.
> [9:51 AM] Marcus Christie: @gary here's what we've been able to confirm: gfac 
> is getting the completed email from the scheduler and processing it 
> immediately. Airavata doesn't mark an experiment as being completed until the 
> output data staging completes. The output data staging for the jobs you gave 
> us to look at took a long time to complete.
> [9:52 AM] Marlon Pierce: You can test network speed in various ways (like 
> https://askubuntu.com/questions/7976/how-do-you-test-the-network-speed-betwen-two-boxes,
>  which I googled).
> [9:53 AM] Marlon Pierce: It won't fix the problem, but it may help network 
> admins diagnose the problem
> [9:53 AM] Marlon Pierce: I'm assuming (maybe incorrectly) that this is an 
> issue between Alamo and our servers
> [9:54 AM] Marcus Christie: This is partly because transfer times from alamo 
> to lims are happening very slowly, by my rough calculations at about 200kb/s. 
>  But also the analysis-results.tar file is much larger in these experiments. 
> One example we looked at it was about 200 MB and another was about 1.13 GB.  
> Looking through the logs back to April this is historically very large for an 
> analysis-results.tar file.
> [9:54 AM] Marlon Pierce: But we can confirm this by looking at other servers
> [9:54 AM] Marcus Christie: @marlon this is between alamo and lims. Do we have 
> a login on lims?
> [9:56 AM] Marlon Pierce: Oh...where is the LIMS server? At UTHSCSA?
> [9:57 AM] Marlon Pierce: I think it is.
> [9:58 AM] Marcus Christie: @marlon uslims3.uthscsa.edu
> [9:59 AM] Gary Gorbet: That other job completed. But now there is new, 
> critical, job that FINISHED 40 minutes ago. Airavata job status is EXECUTING.
> [9:59 AM] Marcus Christie: It would be nice if we could run a command line 
> scp transfer test from alamo to uslims3 just to see if it is fundamentally 
> limited or this is a slowness in GFac.
> [10:00 AM] Gary Gorbet: That is easy to do. Will do a test and report.
> [10:00 AM] Marcus Christie: Thanks @gary
> [10:01 AM] Gary Gorbet: Transferred a 1.6M file from uslims3 to alamo in 
> under a second.
> [10:02 AM] Marcus Christie: Actually I realized that GFac will do the 
> transfer from alamo to uslims3 via GFac. It's not a true third party transfer 
> but rather the data is streamed through the GFac server.  So @marlon was 
> right we need to test between alamo -> Gfac and GFac -> uslims3.
> [10:03 AM] Gary Gorbet: This would be alamo/uslims3 to gw153; right?
> [10:03 AM] Marcus Christie: Thanks @gary. So that's our theoretical upper 
> bound.
> [10:04 AM] Marcus Christie: @gary that's right
> [10:08 AM] Sudhakar Pamidighantam: I think If it is reasonably secure GFac 
> should do a true third party transfer to avoid this delay.
> [10:08 AM] Gary Gorbet: The LIMS work directory for the job has all the 
> output files. So, it seems to me that gw153 to uslims3 transfers completed. 
> So, why the EXECUTING gfac status?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to