Hi, if I simplify my code, I basically do this: hadoop dfs -rm -skipTrash $file hadoop dfs -copyFromLocal - $local $file
(the removal is needed because I run a job but previous input/output may exist, so I need to delete it first, as -copyFromLocal does not support overwrite) During the 2nd command I often get NotReplicatedYetException errors. (see below for complete stacktrace): Is there a way to make commands not return until the file removal/ (and in extension: addition) has completely replicated? I couldn't find it in the help for `hadoop dfs -copyFromLocal, -rm, -rmr, ...` etc. I found https://issues.apache.org/jira/browse/HADOOP-1595 which introduces a flag for synchronous operation, but it only affects setrep. Alternatively (a bit more ugly but still acceptable ): is there a command/script I can execute to check "whatever the latest operation was on this file, has it been replicated yet?" I've been looking at `hadoop fsck` which seems to do what I want, at least for files that should exist, but for removed files I'm not so sure. And it's hard to manually test all possible edge cases and race conditions. Currently I'm running my script without the -skipTrash in the assumption the operation is much faster, and the race condition will be less likely. However I'm not too fond of this approach as it could still break (copyFromLocal could break when the first file hasn't been fully moved to thrash yet) and I don't have that much diskspace and I rather save diskspace as soon as I can, I'm always sure I won't need the files again anyway. I guess yet another trick could be for me to move the file first to a temporary junkfile and start a `hadoop dfs -rm -skipTrash` on the junkfile, but that could cause 2 race conditions (both the copyFromLocal of the original file and the rm of the junkfile could break if the move is not yet fully replicated) thanks for any input, Dieter 11/06/20 14:15:40 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not replicated yet:/user/dplaetin/wip-dirty-CF0-CHNK100-EB1-FW1-FW_K0-FW_NA1-FW_NB0-M2-MP0-NFmerged_ner_mofis-NUMBEST10-PATRN_INCL-PR_A0-PR_Fpunctuation_v1-S_L0-S_P0-SR_Fstopwords_ranks.nl_v1-SQ_K10-SQ_R1-TFIDF1-input at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1257) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy0.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy0.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
