[ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated BEAM-13716:
-----------------------------
    Status: Triage Needed  (was: Resolved)

> Clear before creating a new virtual environment in setupVirtualenv
> ------------------------------------------------------------------
>
>                 Key: BEAM-13716
>                 URL: https://issues.apache.org/jira/browse/BEAM-13716
>             Project: Beam
>          Issue Type: Bug
>          Components: build-system, testing
>            Reporter: Heejong Lee
>            Assignee: Heejong Lee
>            Priority: P1
>             Fix For: 2.36.0
>
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> h2. *Summary*
> An existing virtualenv directory should be cleared before creating a new one.
> h2. *Problem Description*
> A virtualenv directory name for Python tasks is generated from the hash of 
> the project path so any tasks that have the same project path share the same 
> virtualenv directory. The problem is that when {{setupVirtualenv}} task 
> initializes a new virtualenv directory it doesn't overwrite an existing data. 
> This can cause a subtle bug which is very hard to debug. See the following 
> example:
> {noformat}
> ❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
> Configuration on demand is an incubating feature.
> > Task :sdks:python:setupVirtualenv
> > Task :sdks:python:sdist
> > Task :sdks:python:installGcpTest
> Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 
> attrs-21.4.0 azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 
> botocore-1.23.41 cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 
> charset-normalizer-2.0.10 cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 
> deprecation-2.1.0 dill-0.3.1.1 docker-5.0.3 docopt-0.6.2 execnet-1.9.0 
> fastavro-1.4.9 fasteners-0.17.2 freezegun-1.1.0 google-api-core-1.31.5 
> google-apitools-0.5.31 google-auth-1.35.0 google-cloud-bigquery-2.32.0 
> google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
> google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
> google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
> google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
> google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
> google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
> googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
> grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
> isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
> msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
> oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
> pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
> pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 
> pyhamcrest-1.10.1 pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 
> pytest-forked-1.4.0 pytest-timeout-1.4.2 pytest-xdist-1.34.0 
> python-dateutil-2.8.2 pytz-2021.3 pyyaml-6.0 requests-2.27.1 
> requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 s3transfer-0.5.0 
> sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
> typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
> urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3
> > Task :sdks:python:wordCount
> INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 
> seconds.
> INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> WARNING:root:Make sure that locally built Python SDK docker image has Python 
> 3.8 interpreter.
> INFO:root:Default Python SDK image for environment is 
> apache/beam_python3.8_sdk:2.37.0.dev
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function annotate_downstream_side_inputs at 0x122f479d0> 
> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function fix_side_input_pcoll_coders at 0x122f47af0> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function pack_combiners at 0x122f48040> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function lift_combiners at 0x122f480d0> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function expand_sdf at 0x122f48280> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function expand_gbk at 0x122f48310> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function sink_flattens at 0x122f48430> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function greedily_fuse at 0x122f484c0> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function read_to_impulse at 0x122f48550> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function impulse_to_input at 0x122f485e0> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function sort_stages at 0x122f48820> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function setup_timer_mapping at 0x122f48790> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function populate_data_channel_coders at 0x122f488b0> ====================
> INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
> INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
> Worker handler 
> <apache_beam.runners.portability.fn_api_runner.worker_handlers.EmbeddedWorkerHandler
>  object at 0x122fdeca0> for environment ref_Environment_default_environment_1 
> (beam:env:embedded_python:v1, b'')
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> (((((ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-Impulse_19)+(ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-FlatMap-lambda-at-core-py-3228-_20))+(ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-Map-decode-_22))+(ref_AppliedPTransform_Write-Write-WriteImpl-InitializeWrite_23))+(ref_PCollection_PCollection_11/Write))+(ref_PCollection_PCollection_12/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> ((((ref_AppliedPTransform_Read-Read-Impulse_4)+(ref_AppliedPTransform_Read-Read-Map-lambda-at-iobase-py-898-_5))+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/PairWithRestriction))+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction))+(ref_PCollection_PCollection_2_split/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> (((((ref_PCollection_PCollection_2_split/Read)+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/Process))+(ref_AppliedPTransform_Split_8))+(ref_AppliedPTransform_PairWIthOne_9))+(GroupAndSum/Precombine))+(GroupAndSum/Group/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> (((((((GroupAndSum/Group/Read)+(GroupAndSum/Merge))+(GroupAndSum/ExtractOutputs))+(ref_AppliedPTransform_Format_14))+(ref_AppliedPTransform_Write-Write-WriteImpl-WindowInto-WindowIntoFn-_24))+(ref_AppliedPTransform_Write-Write-WriteImpl-WriteBundles_25))+(ref_AppliedPTransform_Write-Write-WriteImpl-Pair_26))+(Write/Write/WriteImpl/GroupByKey/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> ((Write/Write/WriteImpl/GroupByKey/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-Extract_28))+(ref_PCollection_PCollection_17/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> ((ref_PCollection_PCollection_11/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-PreFinalize_29))+(ref_PCollection_PCollection_18/Write)
> WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path 
> matching: -*-of-%(num_shards)05d
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> (ref_PCollection_PCollection_11/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-FinalizeWrite_30)
> INFO:apache_beam.io.filebasedsink:Starting finalize_write threads with 
> num_shards: 1 (skipped: 0), batches: 1, num_threads: 1
> INFO:apache_beam.io.filebasedsink:Renamed 1 shards in 0.02 seconds.
> Deprecated Gradle features were used in this build, making it incompatible 
> with Gradle 8.0.
> You can use '--warning-mode all' to show the individual deprecation warnings 
> and determine if they come from your own scripts or plugins.
> See 
> https://docs.gradle.org/7.3.2/userguide/command_line_interface.html#sec:command_line_warnings
> BUILD SUCCESSFUL in 1m 14s
> 14 actionable tasks: 4 executed, 10 up-to-date
> ❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.6
> Configuration on demand is an incubating feature.
> > Task :sdks:python:setupVirtualenv
> > Task :sdks:python:installGcpTest
> > Task :sdks:python:wordCount
> INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 
> seconds.
> INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> WARNING:root:Make sure that locally built Python SDK docker image has Python 
> 3.8 interpreter.
> INFO:root:Default Python SDK image for environment is 
> apache/beam_python3.8_sdk:2.37.0.dev
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function annotate_downstream_side_inputs at 0x124afa9d0> 
> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function fix_side_input_pcoll_coders at 0x124afaaf0> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function pack_combiners at 0x124afb040> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function lift_combiners at 0x124afb0d0> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function expand_sdf at 0x124afb280> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function expand_gbk at 0x124afb310> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function sink_flattens at 0x124afb430> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function greedily_fuse at 0x124afb4c0> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function read_to_impulse at 0x124afb550> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function impulse_to_input at 0x124afb5e0> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function sort_stages at 0x124afb820> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function setup_timer_mapping at 0x124afb790> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
>  <function populate_data_channel_coders at 0x124afb8b0> ====================
> INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
> INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
> Worker handler 
> <apache_beam.runners.portability.fn_api_runner.worker_handlers.EmbeddedWorkerHandler
>  object at 0x124bd6f70> for environment ref_Environment_default_environment_1 
> (beam:env:embedded_python:v1, b'')
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> (((((ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-Impulse_19)+(ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-FlatMap-lambda-at-core-py-3228-_20))+(ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-Map-decode-_22))+(ref_AppliedPTransform_Write-Write-WriteImpl-InitializeWrite_23))+(ref_PCollection_PCollection_11/Write))+(ref_PCollection_PCollection_12/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> ((((ref_AppliedPTransform_Read-Read-Impulse_4)+(ref_AppliedPTransform_Read-Read-Map-lambda-at-iobase-py-898-_5))+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/PairWithRestriction))+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction))+(ref_PCollection_PCollection_2_split/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> (((((ref_PCollection_PCollection_2_split/Read)+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/Process))+(ref_AppliedPTransform_Split_8))+(ref_AppliedPTransform_PairWIthOne_9))+(GroupAndSum/Precombine))+(GroupAndSum/Group/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> (((((((GroupAndSum/Group/Read)+(GroupAndSum/Merge))+(GroupAndSum/ExtractOutputs))+(ref_AppliedPTransform_Format_14))+(ref_AppliedPTransform_Write-Write-WriteImpl-WindowInto-WindowIntoFn-_24))+(ref_AppliedPTransform_Write-Write-WriteImpl-WriteBundles_25))+(ref_AppliedPTransform_Write-Write-WriteImpl-Pair_26))+(Write/Write/WriteImpl/GroupByKey/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> ((Write/Write/WriteImpl/GroupByKey/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-Extract_28))+(ref_PCollection_PCollection_17/Write)
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> ((ref_PCollection_PCollection_11/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-PreFinalize_29))+(ref_PCollection_PCollection_18/Write)
> WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path 
> matching: -*-of-%(num_shards)05d
> INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running 
> (ref_PCollection_PCollection_11/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-FinalizeWrite_30)
> INFO:apache_beam.io.filebasedsink:Starting finalize_write threads with 
> num_shards: 1 (skipped: 0), batches: 1, num_threads: 1
> INFO:apache_beam.io.filebasedsink:Renamed 1 shards in 0.02 seconds.
> Deprecated Gradle features were used in this build, making it incompatible 
> with Gradle 8.0.
> You can use '--warning-mode all' to show the individual deprecation warnings 
> and determine if they come from your own scripts or plugins.
> See 
> https://docs.gradle.org/7.3.2/userguide/command_line_interface.html#sec:command_line_warnings
> BUILD SUCCESSFUL in 1m 8s
> 14 actionable tasks: 3 executed, 11 up-to-date
> {noformat}
> Note that the second Gradle command specified Python 3.6 but the executed 
> test adopted Python 3.8. The first Python version used right after the 
> {{clean}} task fixes the virtualenv Python version. Any tasks thereafter 
> based on the same project path will use the first Python version as shown 
> above.
> h2. *Affected Tests*
> We have Python test suites that run against multiple Python versions. 
> Luckily, most of them have Python versions as a part of their project paths 
> e.g. {{{}:sdks:python:test-suites:dataflow:{*}py38{*}:setupVirtualenv{}}}. 
> For automated Jenkins tests, we also utilize tasks created for each Python 
> versions. The only exception is [cross-language 
> tests|https://github.com/apache/beam/blob/v2.35.0/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Dataflow.groovy#L43]
>  which use for-each loop to run the multiple test for each target Python 
> versions. In summary:
>  * Jenkins Python tests are not affected. In other words, we have a good 
> coverage for multiple Python versions.
>  * Cross-language VR tests are affected. It means that we missed the test 
> coverage of the second Python version, namely Python 3.8
>  * Any tests executed directly from the command-line are error-prone since 
> {{-PpythonVersion}} flag only works for the first task after the {{clean}} 
> task
> h2. *Solution*
> {{venv}} module supports {{--clear}} option which removes any existing 
> virtualenv directory before initializing a new one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to