[
https://issues.apache.org/jira/browse/OOZIE-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550775#comment-16550775
]
Mate Juhasz commented on OOZIE-3273:
------------------------------------
[~shubham.chhabra] if I understand correctly, the functioning can be misleading
as the failed move fs action is marked as succeeded.
[~andras.piros], [~gezapeti] could you please take a look at this issue?
Can this be a bug or 'works as designed'?
> FsAction should fail on retry if destination path exists
> --------------------------------------------------------
>
> Key: OOZIE-3273
> URL: https://issues.apache.org/jira/browse/OOZIE-3273
> Project: Oozie
> Issue Type: Bug
> Affects Versions: 4.2.0
> Reporter: Shubham
> Priority: Major
>
> This FsAction fails with error code FS008 if the source files already exist
> in target folder.
> The expected behavior should be that Oozie will try this action once again
> after 1 minute, and marked the action as failed because the error is still
> there.
> However, Oozie marks the action as success on retry. (we didn't clean up the
> target folder)
> Logs:
> {code}
> 2018-05-15 00:08:05,187 INFO ActionStartXCommand:520 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@loading] Start action
> [0000061-180514024838863-oozie-oozi-W@loading] with user-retry state :
> userRetryCount [0], userRetryMax [2], userRetryInterval [1]
> 2018-05-15 00:08:05,201 WARN ActionStartXCommand:523 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@loading] Error starting action
> [load-staging]. ErrorType [ERROR], ErrorCode [FS008], Message [FS008: move,
> could not move
> [hdfs://nn:8020/user/hive/audit/data/ingestion/USER_ACCOUNT_AF_A/1522284431816-2018-03-28_1747_11.816-PT2M10.096S-TEST.0-19462_24325-67401946-8fcf-4940-91ec-063016a5da48.avro]
> to [hdfs://nn:8020/user/hive/audit/data/staging/USER_ACCOUNT_AF_A]]
> org.apache.oozie.action.ActionExecutorException: FS008: move, could not move
> [hdfs://nn:8020/user/hive/audit/data/ingestion/SAMPLE_WF/1522284431816-2018-03-28_1747_11.816-PT2M10.096S-TEST.0-19462_24325-67401946-8fcf-4940-91ec-063016a5da48.avro]
> to [hdfs://nn:8020/user/hive/audit/data/staging/USER_ACCOUNT_AF_A]
> at
> org.apache.oozie.action.hadoop.FsActionExecutor.move(FsActionExecutor.java:509)
> at
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
> at
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:609)
> at
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:234)
> at
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:65)
> at org.apache.oozie.command.XCommand.call(XCommand.java:287)
> at
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
> at
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-05-15 00:08:05,202 WARN ActionStartXCommand:523 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@loading] Setting Action Status to
> [DONE]
> 2018-05-15 00:08:05,202 INFO ActionStartXCommand:520 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@loading] Preparing retry this
> action [0000061-180514024838863-oozie-oozi-W@loading], errorCode [FS008],
> userRetryCount [0], userRetryMax [2], userRetryInterval [1]
> 2018-05-15 00:09:05,254 INFO ActionStartXCommand:520 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@loading] Start action
> [0000061-180514024838863-oozie-oozi-W@loading] with user-retry state :
> userRetryCount [1], userRetryMax [2], userRetryInterval [1]
> 2018-05-15 00:09:05,276 INFO ActionStartXCommand:520 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@loading]
> [***0000061-180514024838863-oozie-oozi-W@loading***]Action status=DONE
> 2018-05-15 00:09:05,277 INFO ActionStartXCommand:520 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@loading]
> [***0000061-180514024838863-oozie-oozi-W@loading***]Action updated in DB!
> 2018-05-15 00:09:05,314 INFO ActionStartXCommand:520 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@end] Start action
> [0000061-180514024838863-oozie-oozi-W@end] with user-retry state :
> userRetryCount [0], userRetryMax [0], userRetryInterval [10]
> 2018-05-15 00:09:05,314 INFO ActionStartXCommand:520 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@end]
> [***0000061-180514024838863-oozie-oozi-W@end***]Action status=DONE
> 2018-05-15 00:09:05,314 INFO ActionStartXCommand:520 - SERVER[mn2.sf.priv]
> USER[hive] GROUP[-] TOKEN[] APP[sample-wf]
> JOB[0000061-180514024838863-oozie-oozi-W]
> ACTION[0000061-180514024838863-oozie-oozi-W@end]
> [***0000061-180514024838863-oozie-oozi-W@end***]Action updated in DB!
> {code}
> Observations:
> - First time recovery parameter is 'false' so it executes the method
> `fs.rename(p, target) && !recovery)`
> - Second time `recovery` is true, so it enters into the for loop `for (Path p
> : pathArr)` but does not execute rename. So it is successful second time.
> {code:java}
> public void move(Context context, XConfiguration fsConf, Path
> nameNodePath, Path source, Path target, boolean recovery)
> throws ActionExecutorException {
> try {
> source = resolveToFullPath(nameNodePath, source, true);
> validateSameNN(source, target);
> FileSystem fs = getFileSystemFor(source, context, fsConf);
> Path[] pathArr = FileUtil.stat2Paths(fs.globStatus(source));
> if ((pathArr == null || pathArr.length == 0)) {
> if (!recovery) {
> throw new
> ActionExecutorException(ActionExecutorException.ErrorType.ERROR, "FS006",
> "move, source path [\{0}] does not exist",
> source);
> } else {
> return;
> }
> }
> if (pathArr.length > 1 && (!fs.exists(target) ||
> fs.isFile(target))) {
> if (!recovery) {
> throw new
> ActionExecutorException(ActionExecutorException.ErrorType.ERROR, "FS012",
> "move, could not rename multiple sources to the
> same target name");
> } else {
> return;
> }
> }
> checkGlobMax(pathArr);
> for (Path p : pathArr) {
> if (!fs.rename(p, target) && !recovery) {
> throw new
> ActionExecutorException(ActionExecutorException.ErrorType.ERROR, "FS008",
> "move, could not move [\{0}] to [\{1}]", p,
> target);
> }
> }
> } catch (Exception ex) {
> throw convertException(ex);
> }
> }
> {code}
> I think It should fail even second time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)