[
https://issues.apache.org/jira/browse/BEAM-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Weise updated BEAM-3272:
-------------------------------
Priority: Minor (was: Critical)
> ParDoTranslatorTest: Error creating local cluster while creating checkpoint
> file
> --------------------------------------------------------------------------------
>
> Key: BEAM-3272
> URL: https://issues.apache.org/jira/browse/BEAM-3272
> Project: Beam
> Issue Type: Bug
> Components: runner-apex
> Reporter: Eugene Kirpichov
> Priority: Minor
> Labels: flake, sickbay
> Fix For: 2.11.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Failed build:
> https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-apex/5330/console
> Key output:
> {code}
> 2017-11-29T01:21:26.956 [ERROR]
> testAssertionFailure(org.apache.beam.runners.apex.translation.ParDoTranslatorTest)
> Time elapsed: 2.007 s <<< ERROR!
> java.lang.RuntimeException: Error creating local cluster
> at
> org.apache.apex.engine.EmbeddedAppLauncherImpl.getController(EmbeddedAppLauncherImpl.java:122)
> at
> org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:71)
> at
> org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:46)
> at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:197)
> at
> org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:57)
> at
> org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:31)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:304)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:290)
> at
> org.apache.beam.runners.apex.translation.ParDoTranslatorTest.runExpectingAssertionFailure(ParDoTranslatorTest.java:156)
> {code}
> ...
> {code}
> Caused by: ExitCodeException exitCode=1: chmod: cannot access
> ‘/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/com.datatorrent.stram.StramLocalCluster/checkpoints/2/_tmp’:
> No such file or directory
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
> at org.apache.hadoop.util.Shell.run(Shell.java:479)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
> at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1017)
> at
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:99)
> at
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.<init>(ChecksumFs.java:352)
> at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:399)
> at
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:584)
> at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:686)
> at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:682)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at org.apache.hadoop.fs.FileContext.create(FileContext.java:688)
> at
> com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:119)
> ... 50 more
> {code}
> By inspecting code at the stack frames, seems it's trying to copy an
> operator's checkpoint "to HDFS" (which in this case is the local disk), but
> fails while creating the target file of the copy - creation creates the file
> (successfully) and chmods it writable (unsuccessfully). Barring something
> subtle (e.g. chmod being not allowed to call immediately after creating a
> FileOutputStream), this looks like the whole directory was possibly deleted
> from under the process. I don't know why this would be the case though, or
> how to debug it.
> Either way, the path being accessed is funky:
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/...
> - I think it'd be better if this test used a "@Rule TemporaryFolder" to
> store Apex checkpoints. I don't know whether the Apex runner allows that, but
> I can see how it could help reduce interference between tests and potentially
> resolve this issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)