Eroma created AIRAVATA-2945:
-------------------------------
Summary: When output file transfers fail, manual files transfer
process to transfer files to the gateway
Key: AIRAVATA-2945
URL: https://issues.apache.org/jira/browse/AIRAVATA-2945
Project: Airavata
Issue Type: Task
Components: helix implementation
Affects Versions: 0.18
Environment: https://staging.ultrascan.scigap.org
Reporter: Eroma
Assignee: Dimuthu Upeksha
Fix For: 0.18
Due to storage server or HPC cluster being unresponsive or unavailable [1] or
not writable [2] there are failures that could happen with file transfers.
When the file transfer level issues are happening at output staging the
experiment will fail but the job could be successfully completed and the files
are available in the remote cluster for transfer. In such case not to leave the
already used SUs to waste Airavata to have a manual process to transfer the
files. Having a automated process is a possibility but deciding when actually
to run it and for which experiments could lead to unnecessary issues. Hence a
process with human intervention seem to be the more practical and error free
solution here.
[1]
org.apache.airavata.helix.impl.task.TaskOnFailException: Error Code :
62b6a5d4-3567-43b4-9d49-7bff20e3414d, Task
TASK_63e9db6f-def0-4cd6-b973-e90badd4fb01 failed due to Error while checking
the file
/oasis/scratch/comet/us3/temp_project/airavata-workingdirs/PROCESS_0ef47841-d2f9-4120-aab4-7c711d216f99/output/analysis-results.tar
existence, java.net.UnknownHostException: comet.sdsc.edu at
org.apache.airavata.helix.impl.task.AiravataTask.onFail(AiravataTask.java:130)
at
org.apache.airavata.helix.impl.task.staging.OutputDataStagingTask.onRun(OutputDataStagingTask.java:187)
at
org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:349)
at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:92) at
org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by:
org.apache.airavata.agents.api.AgentException: java.net.UnknownHostException:
comet.sdsc.edu at
org.apache.airavata.helix.adaptor.SSHJAgentAdaptor.doesFileExist(SSHJAgentAdaptor.java:201)
at
org.apache.airavata.helix.impl.task.staging.DataStagingTask.transferFileToStorage(DataStagingTask.java:141)
at
org.apache.airavata.helix.impl.task.staging.OutputDataStagingTask.onRun(OutputDataStagingTask.java:172)
... 10 more Caused by: java.net.UnknownHostException: comet.sdsc.edu at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) at
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at
java.net.Socket.connect(Socket.java:589) at
net.schmizz.sshj.SocketClient.connect(SocketClient.java:126) at
net.schmizz.sshj.SocketClient.connect(SocketClient.java:117) at
org.apache.airavata.helix.adaptor.PoolingSSHJClient.createNewSSHClient(PoolingSSHJClient.java:248)
at
org.apache.airavata.helix.adaptor.PoolingSSHJClient.leaseSSHClient(PoolingSSHJClient.java:104)
at
org.apache.airavata.helix.adaptor.PoolingSSHJClient.newSFTPClientWrapper(PoolingSSHJClient.java:291)
at
org.apache.airavata.helix.adaptor.SSHJAgentAdaptor.doesFileExist(SSHJAgentAdaptor.java:198)
... 12 more
[2]
org.apache.airavata.helix.impl.task.TaskOnFailException: Error Code :
5e402d08-30e2-428b-afce-1ca84ef61036, Task
TASK_0dbeea65-22be-4ccf-a242-8f0fc10c2b1b failed due to Failed uploading the
output file to
/srv/www/htdocs/uslims3/uslims3_data/9f851eae-c20a-fde4-8d47-c1322a6b910c/analysis-results.tar
from local path
/tmp/PROCESS_6c944df6-f1d1-485b-9343-37373cd24f4a/temp_inputs/analysis-results.tar,
net.schmizz.sshj.xfer.scp.SCPRemoteException: Remote SCP command had error:
scp:
/srv/www/htdocs/uslims3/uslims3_data/9f851eae-c20a-fde4-8d47-c1322a6b910c/analysis-results.tar:
Read-only file system at
org.apache.airavata.helix.impl.task.AiravataTask.onFail(AiravataTask.java:130)
at
org.apache.airavata.helix.impl.task.staging.OutputDataStagingTask.onRun(OutputDataStagingTask.java:187)
at
org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:349)
at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:92) at
org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by:
org.apache.airavata.agents.api.AgentException:
net.schmizz.sshj.xfer.scp.SCPRemoteException: Remote SCP command had error:
scp:
/srv/www/htdocs/uslims3/uslims3_data/9f851eae-c20a-fde4-8d47-c1322a6b910c/analysis-results.tar:
Read-only file system at
org.apache.airavata.helix.adaptor.SSHJAgentAdaptor.copyFileTo(SSHJAgentAdaptor.java:173)
at
org.apache.airavata.helix.adaptor.SSHJStorageAdaptor.uploadFile(SSHJStorageAdaptor.java:61)
at
org.apache.airavata.helix.impl.task.staging.DataStagingTask.transferFileToStorage(DataStagingTask.java:175)
at
org.apache.airavata.helix.impl.task.staging.OutputDataStagingTask.onRun(OutputDataStagingTask.java:172)
... 10 more Caused by: net.schmizz.sshj.xfer.scp.SCPRemoteException: Remote
SCP command had error: scp:
/srv/www/htdocs/uslims3/uslims3_data/9f851eae-c20a-fde4-8d47-c1322a6b910c/analysis-results.tar:
Read-only file system at
net.schmizz.sshj.xfer.scp.SCPEngine.check(SCPEngine.java:73) at
net.schmizz.sshj.xfer.scp.SCPEngine.sendMessage(SCPEngine.java:133) at
net.schmizz.sshj.xfer.scp.SCPUploadClient.sendFile(SCPUploadClient.java:97) at
net.schmizz.sshj.xfer.scp.SCPUploadClient.process(SCPUploadClient.java:78) at
net.schmizz.sshj.xfer.scp.SCPUploadClient.startCopy(SCPUploadClient.java:70) at
net.schmizz.sshj.xfer.scp.SCPUploadClient.copy(SCPUploadClient.java:50) at
net.schmizz.sshj.xfer.scp.SCPUploadClient.copy(SCPUploadClient.java:43) at
net.schmizz.sshj.xfer.scp.SCPFileTransfer.upload(SCPFileTransfer.java:55) at
org.apache.airavata.helix.adaptor.wrapper.SCPFileTransferWrapper.upload(SCPFileTransferWrapper.java:44)
at
org.apache.airavata.helix.adaptor.SSHJAgentAdaptor.copyFileTo(SSHJAgentAdaptor.java:171)
... 13 more
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)