[
https://issues.apache.org/jira/browse/BEAM-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689019#comment-16689019
]
Valentyn Tymofieiev commented on BEAM-5873:
-------------------------------------------
Thanks, [~markflyhigh]!
I tried starting a VM, and starting a Python container from the running VM a
few hundred times, to reproduce the error, but observed no failures.
I was able to reproduce the error by running postcommits ~7 times or so, until
I saw one Batch job that was not making progress after 18 min. Luckily, the job
runs for 1 hr before Dataflow gives up on it, and stops the VM.
I then made a snapshot of the VM image in Cloud UI, and created a VM image
using the snapshot.
After that, I was able to see a crashed docker container, in a stopped state. I
then created a new container image from a stopped container, started the
container from the created image, and reproduced the pip failure:
{noformat}
valentyn@valentyn-repro-beam-5873-instance ~ $ docker ps -a
...
b47989c515cf
dataflow.gcr.io/v1beta3/python@sha256:65f1cbe78e35d9f72368ba36597762a7b07fa31781055f6a291cf39a64d19e0b
"/opt/google/dataf..." 28 minutes ago Exited (1) 27 minutes ago
...
valentyn@valentyn-repro-beam-5873-instance ~ $ docker commit b47989c515cf
valentyn/broken_python_image
valentyn@valentyn-repro-beam-5873-instance ~ $ docker run -it
--entrypoint=/bin/bash valentyn/broken_python_image
root@abfc4eb95010:/# pip
Traceback (most recent call last):
File "/usr/local/bin/pip", line 7, in <module>
from pip._internal import main
ImportError: No module named pip._internal
{noformat}
We can now investigate VM snapshot and Python container image cached in the
snapshot to understand what is happening here.
> Python test failure: "ImportError: No module named pip._internal"
> -----------------------------------------------------------------
>
> Key: BEAM-5873
> URL: https://issues.apache.org/jira/browse/BEAM-5873
> Project: Beam
> Issue Type: Bug
> Components: test-failures
> Reporter: Henning Rohde
> Assignee: Valentyn Tymofieiev
> Priority: Major
>
> https://scans.gradle.com/s/r55ln7mdibu2w/console-log?task=:beam-sdks-python:postCommitITTests#L163
> Logs:
> https://pantheon.corp.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2018-10-26_06_46_26-13501822612780835073&logName=projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker-startup&interval=NO_LIMIT&project=apache-beam-testing&minLogLevel=0&expandAll=false×tamp=2018-10-26T20:01:54.773000000Z&customFacets=&limitCustomFacetWidth=true&scrollTimestamp=2018-10-26T13:49:18.405228000Z
> Executing: /usr/local/bin/pip install
> /var/opt/google/dataflow/dataflow_python_sdk.tar[gcp]
> Debug: delayed tasks complete
> Debug: download complete
> Traceback (most recent call last):
> File "/usr/local/bin/pip", line 7, in <module>
> from pip._internal import main
> ImportError
> :
> No module named pip._internal
> /usr/local/bin/pip failed with exit status 1
> Maybe a flake?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)