[
https://issues.apache.org/jira/browse/AIRAVATA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101759#comment-16101759
]
Marcus Christie edited comment on AIRAVATA-2492 at 7/26/17 2:54 PM:
--------------------------------------------------------------------
Example of a fast file transfer in seagrid: 6BG archive.tar file transferred
in about 7 minutes.
Experiment ID:
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732
gfac logs on scigap03:
{noformat}
2017-07-26 09:13:29,653 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key
:ccguser_stamp
ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,653 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Reuse SSH session for
:ccguser_stampede.tacc.xsede.
org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,656 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - SSH Session validation succeeded, key
:pga_gf4.ucs.
indiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key
:pga_gf4.ucs.i
ndiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Reuse SSH session for
:pga_gf4.ucs.indiana.edu_22_3
d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - SSH Session validation succeeded, key
:ccguser_stam
pede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,086 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key
:ccguser_stamp
ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,086 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Reuse SSH session for
:ccguser_stampede.tacc.xsede.
org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,122 [pool-11-thread-145] INFO
o.a.a.g.impl.HPCRemoteCluster - Executing command cd
/scratch/00421/ccguser/seagrid-
workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e && tar -cvf
/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27
f-0e8c00c8535e/archive.tar ./*
2017-07-26 09:13:30,680 [Thread-5] INFO o.a.a.g.m.e.EmailBasedMonitor -
[EJM]: 161 new email/s received
2017-07-26 09:14:06,669 [pool-11-thread-145] INFO
o.a.a.g.impl.HPCRemoteCluster - Transferring
from:/scratch/00421/ccguser/seagrid-wor
kdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To:
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMB
ER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar
2017-07-26 09:14:06,741 [pool-11-thread-145] INFO o.a.a.gfac.impl.SSHUtils -
Initiating transfer from:/scratch/00421/ccguser/seagrid-w
orkdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To:
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_A
MBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar, Ignore Empty
file : true
2017-07-26 09:14:07,064 [pool-11-thread-145] INFO o.a.a.gfac.impl.SSHUtils -
Transferring
from:/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar
To:
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar,
File Size : 6057000960, Ignore Empty file : true
...
2017-07-26 09:21:05,861 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - SSH Session validation succeeded, key
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,843 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,843 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Reuse SSH session for
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,879 [pool-11-thread-145] INFO
o.a.a.g.impl.HPCRemoteCluster - Executing command rm
/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar
2017-07-26 09:21:07,818 [pool-11-thread-145] INFO
o.a.a.g.impl.task.ArchiveTask - Executing command cd
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115
&& mkdir ARCHIVE && tar -xvf archive.tar -C ARCHIVE && rm archive.tar && chmod
755 -R ARCHIVE/*
2017-07-26 09:23:33,645 [pool-11-thread-145] INFO
o.a.a.g.c.context.TaskContext - expId:
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732,
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e, taskId:
TASK_ca02a6a5-4460-4bc4-ba73-4b0000a1be7e, type: DATA_STAGING : Task status
changed EXECUTING -> COMPLETED
2017-07-26 09:23:33,656 [pool-11-thread-145] INFO o.a.a.g.c.c.ProcessContext
- expId:
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732,
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Process status
changed OUTPUT_DATA_STAGING -> COMPLETED
2017-07-26 09:23:33,667 [pool-11-thread-145] INFO o.a.a.gfac.impl.GFacWorker
- expId:
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732,
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Sent ack for
deliveryTag 168
{noformat}
File archive.tar with size 6057000960 transfered from stampede to
gf4.ucs.indiana.edu in just under 7 minutes ({{2017-07-26 09:14:07,064}} -
{{2017-07-26 09:21:05,861}}).
This works out to about 14 MB/sec. So GFac can transfer at least this fast.
That puts the transfer from alamo to uslims3 at about 2 orders of magnitude
slower.
This points to there being a network issue between alamo->gw153->uslims3 as the
underlying problem.
was (Author: marcuschristie):
Example of a fast file transfer in seagrid: 6BG archive.tar file transferred
in about 7 minutes.
Experiment ID:
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732
gfac logs on scigap03:
{noformat}
2017-07-26 09:13:29,653 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key
:ccguser_stamp
ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,653 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Reuse SSH session for
:ccguser_stampede.tacc.xsede.
org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,656 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - SSH Session validation succeeded, key
:pga_gf4.ucs.
indiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key
:pga_gf4.ucs.i
ndiana.edu_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Reuse SSH session for
:pga_gf4.ucs.indiana.edu_22_3
d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:29,676 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - SSH Session validation succeeded, key
:ccguser_stam
pede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,086 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key
:ccguser_stamp
ede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,086 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Reuse SSH session for
:ccguser_stampede.tacc.xsede.
org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:13:30,122 [pool-11-thread-145] INFO
o.a.a.g.impl.HPCRemoteCluster - Executing command cd
/scratch/00421/ccguser/seagrid-
workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e && tar -cvf
/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27
f-0e8c00c8535e/archive.tar ./*
2017-07-26 09:13:30,680 [Thread-5] INFO o.a.a.g.m.e.EmailBasedMonitor -
[EJM]: 161 new email/s received
2017-07-26 09:14:06,669 [pool-11-thread-145] INFO
o.a.a.g.impl.HPCRemoteCluster - Transferring
from:/scratch/00421/ccguser/seagrid-wor
kdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To:
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMB
ER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar
2017-07-26 09:14:06,741 [pool-11-thread-145] INFO o.a.a.gfac.impl.SSHUtils -
Initiating transfer from:/scratch/00421/ccguser/seagrid-w
orkdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar To:
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_A
MBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar, Ignore Empty
file : true
2017-07-26 09:14:07,064 [pool-11-thread-145] INFO o.a.a.gfac.impl.SSHUtils -
Transferring
from:/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar
To:
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115/archive.tar,
File Size : 6057000960, Ignore Empty file : true
...
2017-07-26 09:21:05,861 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - SSH Session validation succeeded, key
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,843 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Channel creation test succeeded, key
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,843 [pool-11-thread-145] INFO
o.a.airavata.gfac.impl.Factory - Reuse SSH session for
:ccguser_stampede.tacc.xsede.org_22_3d65bf6d-2c9f-4166-a51b-e76e0022bd3b
2017-07-26 09:21:06,879 [pool-11-thread-145] INFO
o.a.a.g.impl.HPCRemoteCluster - Executing command rm
/scratch/00421/ccguser/seagrid-workdirs/PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e/archive.tar
2017-07-26 09:21:07,818 [pool-11-thread-145] INFO
o.a.a.g.impl.task.ArchiveTask - Executing command cd
/var/www/portals/gateway-user-data/seagrid/nastia/polya_15_nt_ful_AMBER/Clone_of_polya_15_nt_unprot_am_prod_021500987115
&& mkdir ARCHIVE && tar -xvf archive.tar -C ARCHIVE && rm archive.tar && chmod
755 -R ARCHIVE/*
2017-07-26 09:23:33,645 [pool-11-thread-145] INFO
o.a.a.g.c.context.TaskContext - expId:
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732,
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e, taskId:
TASK_ca02a6a5-4460-4bc4-ba73-4b0000a1be7e, type: DATA_STAGING : Task status
changed EXECUTING -> COMPLETED
2017-07-26 09:23:33,656 [pool-11-thread-145] INFO o.a.a.g.c.c.ProcessContext
- expId:
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732,
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Process status
changed OUTPUT_DATA_STAGING -> COMPLETED
2017-07-26 09:23:33,667 [pool-11-thread-145] INFO o.a.a.gfac.impl.GFacWorker
- expId:
Clone_of_polya_15_nt_unprot_am_prod_02_f762fa5f-b039-4352-812b-11cde23a7732,
processId: PROCESS_ba393353-611c-45fb-a27f-0e8c00c8535e :- Sent ack for
deliveryTag 168
{noformat}
File archive.tar with size 6057000960 transfered from stampede to
gf4.ucs.indiana.edu in just under 7 minutes ({{2017-07-26 09:14:07,064}} -
{{2017-07-26 09:21:05,861}}).
> File transfer slow from alamo.uthscsa.edu to uslims3.uthscsa.edu
> ----------------------------------------------------------------
>
> Key: AIRAVATA-2492
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2492
> Project: Airavata
> Issue Type: Bug
> Reporter: Marcus Christie
> Assignee: Marcus Christie
>
> Experiment ids in ultrascan.scigap.org:
> US3-AIRA_f200b78e-5d86-499c-b2e6-c3b6fbb3bfa3
> US3-AIRA_b69482ba-b34f-4809-9274-bcda71b8dccc
> Hipchat discussion:
> {quote}
> [11:27 PM] Gary Gorbet: On Alamo, 20 minutes are more elapse from end of job
> until Airavata reports a job status of COMPLETE. Can't *something* be done to
> speed that up!
> ---- Wednesday July 26, 2017 ----
> [9:31 AM] Eroma Abeysinghe: @gary sorry for this. Currently all experiments
> are completed, and two running in alamo. but we found two which took rather
> unusually long time to transfers the files. we are investigating. will keep
> you posted.
> [9:51 AM] Marcus Christie: @gary here's what we've been able to confirm: gfac
> is getting the completed email from the scheduler and processing it
> immediately. Airavata doesn't mark an experiment as being completed until the
> output data staging completes. The output data staging for the jobs you gave
> us to look at took a long time to complete.
> [9:52 AM] Marlon Pierce: You can test network speed in various ways (like
> https://askubuntu.com/questions/7976/how-do-you-test-the-network-speed-betwen-two-boxes,
> which I googled).
> [9:53 AM] Marlon Pierce: It won't fix the problem, but it may help network
> admins diagnose the problem
> [9:53 AM] Marlon Pierce: I'm assuming (maybe incorrectly) that this is an
> issue between Alamo and our servers
> [9:54 AM] Marcus Christie: This is partly because transfer times from alamo
> to lims are happening very slowly, by my rough calculations at about 200kb/s.
> But also the analysis-results.tar file is much larger in these experiments.
> One example we looked at it was about 200 MB and another was about 1.13 GB.
> Looking through the logs back to April this is historically very large for an
> analysis-results.tar file.
> [9:54 AM] Marlon Pierce: But we can confirm this by looking at other servers
> [9:54 AM] Marcus Christie: @marlon this is between alamo and lims. Do we have
> a login on lims?
> [9:56 AM] Marlon Pierce: Oh...where is the LIMS server? At UTHSCSA?
> [9:57 AM] Marlon Pierce: I think it is.
> [9:58 AM] Marcus Christie: @marlon uslims3.uthscsa.edu
> [9:59 AM] Gary Gorbet: That other job completed. But now there is new,
> critical, job that FINISHED 40 minutes ago. Airavata job status is EXECUTING.
> [9:59 AM] Marcus Christie: It would be nice if we could run a command line
> scp transfer test from alamo to uslims3 just to see if it is fundamentally
> limited or this is a slowness in GFac.
> [10:00 AM] Gary Gorbet: That is easy to do. Will do a test and report.
> [10:00 AM] Marcus Christie: Thanks @gary
> [10:01 AM] Gary Gorbet: Transferred a 1.6M file from uslims3 to alamo in
> under a second.
> [10:02 AM] Marcus Christie: Actually I realized that GFac will do the
> transfer from alamo to uslims3 via GFac. It's not a true third party transfer
> but rather the data is streamed through the GFac server. So @marlon was
> right we need to test between alamo -> Gfac and GFac -> uslims3.
> [10:03 AM] Gary Gorbet: This would be alamo/uslims3 to gw153; right?
> [10:03 AM] Marcus Christie: Thanks @gary. So that's our theoretical upper
> bound.
> [10:04 AM] Marcus Christie: @gary that's right
> [10:08 AM] Sudhakar Pamidighantam: I think If it is reasonably secure GFac
> should do a true third party transfer to avoid this delay.
> [10:08 AM] Gary Gorbet: The LIMS work directory for the job has all the
> output files. So, it seems to me that gw153 to uslims3 transfers completed.
> So, why the EXECUTING gfac status?
> {quote}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)