[
https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920279#comment-16920279
]
Hannah Jiang edited comment on BEAM-7993 at 9/1/19 4:29 AM:
------------------------------------------------------------
Thanks Mark for trying it out.
I analyzed last 60 Python Portable Precommit tests from Jenkins to see if I can
find some patterns. There was no obvious patterns, however, all tests submitted
to Jenkins agent12 failed. It's also more likely to fail if we parallel run
multiple Python Portable Precommit tests on the same agent, which means we may
be running multiple sdist parallel in this case(It's depending on test start
time, sdist running at first part, so if two test start time is close, they may
run sdist parallel.). However, even if we run only one Python Portable
Precommit test in an agent the failure still happens, so it's not enough to say
parallel running sdist caused this issue. However, it's still worth to check it
out.
I attached a list of these 60 tests in case it can help you any way.
[^Python_Portable_Precommit.pdf]
was (Author: hannahjiang):
Thanks Mark for trying it out.
I listed last 60 Python Portable Precommit tests from Jenkins to see if I can
find some patterns. There was no obvious patterns, however, all tests submitted
to Jenkins agent12 failed. It's also more likely to fail if we parallel run
multiple Python Portable Precommit tests on the same agent, which means we are
running sdist parallel in this case. However, even if we run only one Python
Portable Precommit test in an agent the failure still happens, so it's not
enough to say parallel running sdist caused this issue. However, it's still
worth to check it out.
I attached a list of these 60 tests in case it can help you any way.
[^Python_Portable_Precommit.pdf]
> portable python precommit is flaky
> ----------------------------------
>
> Key: BEAM-7993
> URL: https://issues.apache.org/jira/browse/BEAM-7993
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core, test-failures, testing
> Affects Versions: 2.15.0
> Reporter: Udi Meiri
> Assignee: Mark Liu
> Priority: Major
> Labels: currently-failing
> Fix For: 2.16.0
>
> Attachments: Python_Portable_Precommit.pdf
>
> Time Spent: 4h 20m
> Remaining Estimate: 0h
>
> I'm not sure what the root cause is here.
> Example log where
> :sdks:python:test-suites:portable:py35:portableWordCountBatch failed:
> {code}
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap
> (FlatMap at ExtractOutput[0]) (2/2)] ERROR
> org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at
> ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap
> (FlatMap at ExtractOutput[0]) (1/2)] ERROR
> org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at
> ExtractOutput[0]) (1/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>),
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR
> org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN
> MapPartition (MapPartition at
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>),
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>),
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR
> org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN
> MapPartition (MapPartition at
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>),
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 java.lang.Exception: The user defined 'open()' method caused an
> exception: java.io.IOException: Received exit code 1 for command 'docker
> inspect -f {{.State.Running}}
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr:
> Error: No such object:
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22 at
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> 11:51:22 at
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> 11:51:22 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> 11:51:22 at java.lang.Thread.run(Thread.java:748)
> 11:51:22 Caused by:
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
> java.io.IOException: Received exit code 1 for command 'docker inspect -f
> {{.State.Running}}
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr:
> Error: No such object:
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22 at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
> 11:51:22 at
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211)
> 11:51:22 at
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202)
> 11:51:22 at
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
> 11:51:22 at
> org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
> 11:51:22 at
> org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
> 11:51:22 at
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
> 11:51:22 at
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 11:51:22 at
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
> 11:51:22 ... 3 more
> {code}
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull
--
This message was sent by Atlassian Jira
(v8.3.2#803003)