[ https://issues.apache.org/jira/browse/HADOOP-17258?focusedWorklogId=497565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497565 ]
ASF GitHub Bot logged work on HADOOP-17258: ------------------------------------------- Author: ASF GitHub Bot Created on: 08/Oct/20 20:25 Start Date: 08/Oct/20 20:25 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2371: URL: https://github.com/apache/hadoop/pull/2371#issuecomment-705803820 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 31s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 1 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 51s | | trunk passed | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | compile | 0m 37s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 42s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 21s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 0m 31s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +0 :ok: | spotbugs | 1m 5s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 1m 2s | | trunk passed | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 0m 34s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javac | 0m 34s | | the patch passed | | +1 :green_heart: | compile | 0m 29s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | javac | 0m 29s | | the patch passed | | +1 :green_heart: | checkstyle | 0m 20s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 14m 11s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 19s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | findbugs | 1m 6s | | the patch passed | |||| _ Other Tests _ | | +1 :green_heart: | unit | 1m 27s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 33s | | The patch does not generate ASF License warnings. | | | | 74m 48s | | | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2371/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2371 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux db666719b049 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / deb35a32baf | | Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2371/1/testReport/ | | Max. process+thread count | 421 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2371/1/console | | versions | git=2.17.1 maven=3.6.0 findbugs=4.0.6 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 497565) Time Spent: 2h 50m (was: 2h 40m) > MagicS3GuardCommitter fails with `pendingset` already exists > ------------------------------------------------------------ > > Key: HADOOP-17258 > URL: https://issues.apache.org/jira/browse/HADOOP-17258 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 3.2.0 > Reporter: Dongjoon Hyun > Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > In `trunk/branch-3.3/branch-3.2`, `MagicS3GuardCommitter.innerCommitTask` has > `false` at `pendingSet.save`. > {code} > try { > pendingSet.save(getDestFS(), taskOutcomePath, false); > } catch (IOException e) { > LOG.warn("Failed to save task commit data to {} ", > taskOutcomePath, e); > abortPendingUploads(context, pendingSet.getCommits(), true); > throw e; > } > {code} > And, it can cause a job failure like the following. > {code} > WARN TaskSetManager: Lost task 1562.1 in stage 1.0 (TID 1788, 100.92.11.63, > executor 26): org.apache.spark.SparkException: Task failed while writing rows. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: > s3a://xxx/__magic/app-attempt-0000/task_20200911063607_0001_m_001562.pendingset > already exists > at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:761) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987) > at > org.apache.hadoop.util.JsonSerialization.save(JsonSerialization.java:269) > at > org.apache.hadoop.fs.s3a.commit.files.PendingSet.save(PendingSet.java:170) > at > org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter.innerCommitTask(MagicS3GuardCommitter.java:220) > at > org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter.commitTask(MagicS3GuardCommitter.java:165) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:244) > at > org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242) > {code} > {code} > 20/09/11 07:44:38 ERROR TaskSetManager: Task 957.1 in stage 1.0 (TID 1412) > can not write to output file: > org.apache.hadoop.fs.FileAlreadyExistsException: > s3a://xxx/t/__magic/app-attempt-0000/task_20200911073922_0001_m_000957.pendingset > already exists; not retrying > {code} > The above happens in EKS with S3 environment and the job failure happens when > some executor containers are killed by K8s -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org