[ 
https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920279#comment-16920279
 ] 

Hannah Jiang edited comment on BEAM-7993 at 9/1/19 4:29 AM:
------------------------------------------------------------

Thanks Mark for trying it out.

I analyzed last 60 Python Portable Precommit tests from Jenkins to see if I can 
find some patterns. There was no obvious patterns, however, all tests submitted 
to Jenkins agent12 failed. It's also more likely to fail if we parallel run 
multiple Python Portable Precommit tests on the same agent, which means we may 
be running multiple sdist parallel in this case(It's depending on test start 
time, sdist running at first part, so if two test start time is close, they may 
run sdist parallel.). However, even if we run only one Python Portable 
Precommit test in an agent the failure still happens, so it's not enough to say 
parallel running sdist caused this issue. However, it's still worth to check it 
out.

I attached a list of these 60 tests in case it can help you any way. 
[^Python_Portable_Precommit.pdf] 


was (Author: hannahjiang):
Thanks Mark for trying it out.

I listed last 60 Python Portable Precommit tests from Jenkins to see if I can 
find some patterns. There was no obvious patterns, however, all tests submitted 
to Jenkins agent12 failed. It's also more likely to fail if we parallel run 
multiple Python Portable Precommit tests on the same agent, which means we are 
running sdist parallel in this case. However, even if we run only one Python 
Portable Precommit test in an agent the failure still happens, so it's not 
enough to say parallel running sdist caused this issue. However, it's still 
worth to check it out.

I attached a list of these 60 tests in case it can help you any way. 
[^Python_Portable_Precommit.pdf] 

> portable python precommit is flaky
> ----------------------------------
>
>                 Key: BEAM-7993
>                 URL: https://issues.apache.org/jira/browse/BEAM-7993
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core, test-failures, testing
>    Affects Versions: 2.15.0
>            Reporter: Udi Meiri
>            Assignee: Mark Liu
>            Priority: Major
>              Labels: currently-failing
>             Fix For: 2.16.0
>
>         Attachments: Python_Portable_Precommit.pdf
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> I'm not sure what the root cause is here.
> Example log where 
> :sdks:python:test-suites:portable:py35:portableWordCountBatch failed:
> {code}
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (1/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 java.lang.Exception: The user defined 'open()' method caused an 
> exception: java.io.IOException: Received exit code 1 for command 'docker 
> inspect -f {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22      at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> 11:51:22      at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> 11:51:22      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> 11:51:22      at java.lang.Thread.run(Thread.java:748)
> 11:51:22 Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.io.IOException: Received exit code 1 for command 'docker inspect -f 
> {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22      at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
> 11:51:22      at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211)
> 11:51:22      at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202)
> 11:51:22      at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
> 11:51:22      at 
> org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
> 11:51:22      at 
> org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
> 11:51:22      at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
> 11:51:22      at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 11:51:22      at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
> 11:51:22      ... 3 more
> {code}
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to