[
https://issues.apache.org/jira/browse/HDFS-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211729#comment-16211729
]
Jason Lowe commented on HDFS-12688:
-----------------------------------
Have you checked the HDFS audit logs? That should give you clues about what is
happening here and who is re-creating the directory. I suspect what's
happening here is that the job is executing asynchronously, and you're actually
running multiple copies of the job at the same time when you run the script
multiple times. If the job is still running then it is going to re-create the
output directory when its tasks need to write output.
> HDFS File Not Removed Despite Successful "Moved to .Trash" Message
> ------------------------------------------------------------------
>
> Key: HDFS-12688
> URL: https://issues.apache.org/jira/browse/HDFS-12688
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 2.6.0
> Reporter: Shriya Gupta
> Priority: Critical
>
> Wrote a simple script to delete and create a file and ran it multiple times.
> However, some executions of the script randomly threw a FileAlreadyExists
> error while the others succeeded despite successful hdfs dfs -rm command. The
> script is as below, I have reproduced it in two different environments --
> hdfs dfs -ls /user/shriya/shell_test/
> echo "starting hdfs remove **************"
> hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput
> echo "hdfs compeleted!"
> hdfs dfs -ls /user/shriya/shell_test/
> echo "starting mapReduce*******************************"
> mapred job -libjars
> /data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar
> -submit /data/home/shriya/shell_test/wordcountJob.xml
> The message confirming successful move --
> 17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved:
> 'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at:
> hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728
> The contents of subsequent -ls after -rm also showed that the file still
> existed)
> The error I got when my MapReduce job tried to create the file --
> 17/10/19 14:50:00 WARN security.UserGroupInformation:
> PriviledgedActionException as:<REDACTED> (auth:KERBEROS)
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> Exception in thread "main"
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> at
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]