[ 
https://issues.apache.org/jira/browse/BEAM-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689019#comment-16689019
 ] 

Valentyn Tymofieiev commented on BEAM-5873:
-------------------------------------------

Thanks, [~markflyhigh]!

I tried starting a VM, and starting a Python container from the running VM  a 
few hundred times, to reproduce the error, but observed no failures.

I was able to reproduce the error by running postcommits ~7 times or so, until 
I saw one Batch job that was not making progress after 18 min. Luckily, the job 
runs for 1 hr before Dataflow gives up on it, and stops the VM.

I then made a snapshot of the VM image in Cloud UI, and created a VM image 
using the snapshot.

After that, I was able to see a crashed docker container, in a stopped state. I 
then created a new container image from a stopped container, started the 
container from the created image, and reproduced the pip failure:

 
{noformat}
valentyn@valentyn-repro-beam-5873-instance ~ $ docker ps -a
...
b47989c515cf 
dataflow.gcr.io/v1beta3/python@sha256:65f1cbe78e35d9f72368ba36597762a7b07fa31781055f6a291cf39a64d19e0b
 "/opt/google/dataf..." 28 minutes ago Exited (1) 27 minutes ago
...
valentyn@valentyn-repro-beam-5873-instance ~ $ docker commit b47989c515cf 
valentyn/broken_python_image 
valentyn@valentyn-repro-beam-5873-instance ~ $ docker run -it 
--entrypoint=/bin/bash valentyn/broken_python_image
root@abfc4eb95010:/# pip
Traceback (most recent call last):
  File "/usr/local/bin/pip", line 7, in <module>
    from pip._internal import main
ImportError: No module named pip._internal
{noformat}
We can now investigate VM snapshot and Python container image cached in the 
snapshot to understand what is happening here.

 

> Python test failure: "ImportError: No module named pip._internal"
> -----------------------------------------------------------------
>
>                 Key: BEAM-5873
>                 URL: https://issues.apache.org/jira/browse/BEAM-5873
>             Project: Beam
>          Issue Type: Bug
>          Components: test-failures
>            Reporter: Henning Rohde
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>
> https://scans.gradle.com/s/r55ln7mdibu2w/console-log?task=:beam-sdks-python:postCommitITTests#L163
> Logs: 
> https://pantheon.corp.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2018-10-26_06_46_26-13501822612780835073&logName=projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker-startup&interval=NO_LIMIT&project=apache-beam-testing&minLogLevel=0&expandAll=false&timestamp=2018-10-26T20:01:54.773000000Z&customFacets=&limitCustomFacetWidth=true&scrollTimestamp=2018-10-26T13:49:18.405228000Z
> Executing: /usr/local/bin/pip install 
> /var/opt/google/dataflow/dataflow_python_sdk.tar[gcp] 
> Debug: delayed tasks complete 
> Debug: download complete 
> Traceback (most recent call last): 
> File "/usr/local/bin/pip", line 7, in <module> 
> from pip._internal import main 
> ImportError 
> :  
> No module named pip._internal 
> /usr/local/bin/pip failed with exit status 1 
> Maybe a flake?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to