tvalentyn commented on code in PR #28385:
URL: https://github.com/apache/beam/pull/28385#discussion_r1321907970


##########
.github/workflows/build_wheels.yml:
##########
@@ -117,15 +112,15 @@ jobs:
           echo "RELEASE_VERSION=$RELEASE_VERSION" >> $GITHUB_OUTPUT
       - name: Build source
         working-directory: ./sdks/python
-        run: python setup.py sdist --formats=zip
+        run: pip install -U build && python -m build --sdist
       - name: Add checksums
         working-directory: ./sdks/python/dist
         run: |
-          file=$(ls | grep .zip | head -n 1)
+          file=$(ls | grep .tar.gz | head -n 1)

Review Comment:
   we should check which extension the release scripts expect when twine 
uploads artifacts to pypi. We might have to change to use .tar.gz



##########
.github/workflows/typescript_tests.yml:
##########
@@ -89,10 +89,8 @@ jobs:
       - name: Setup Beam Python
         working-directory: ./sdks/python
         run: |
-          pip install pip setuptools --upgrade
-          pip install -r build-requirements.txt
           pip install 'pandas>=1.0,<1.5'
-          python setup.py develop
+          python install -e .

Review Comment:
   should this be `pip install` ?



##########
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy:
##########
@@ -3032,30 +3032,40 @@ class BeamModulePlugin implements Plugin<Project> {
         }
         return argList.join(' ')
       }
-
       project.ext.toxTask = { name, tox_env, posargs='' ->
         project.tasks.register(name) {
           dependsOn setupVirtualenv
           dependsOn ':sdks:python:sdist'
-
-          doLast {
-            // Python source directory is also tox execution workspace, We want
-            // to isolate them per tox suite to avoid conflict when running
-            // multiple tox suites in parallel.
-            project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
-            def copiedPyRoot = "${copiedSrcRoot}/sdks/python"
-            def distTarBall = "${pythonRootDir}/build/apache-beam.tar.gz"
-            project.exec {
-              executable 'sh'
-              args '-c', ". ${project.ext.envdir}/bin/activate && cd 
${copiedPyRoot} && scripts/run_tox.sh $tox_env $distTarBall '$posargs'"
+          if (project.hasProperty('useWheelDistribution')) {

Review Comment:
   why do we need both modes?



##########
sdks/python/apache_beam/coders/slow_coders_test.py:
##########
@@ -25,14 +25,17 @@
 from apache_beam.coders.coders_test_common import *
 
 
[email protected](
+    'Add a test for non-compiled implementation.'
+    'https://github.com/apache/beam/issues/28307')

Review Comment:
   FYI, I talked with robertwb; we should be able to make an assumption beam 
will be installed either from wheels or with Cython at runtime, and remove the 
non-compiled path and respective tests. We can decouple this change in a 
separate PR.
   



##########
sdks/python/scripts/run_tox.sh:
##########
@@ -25,11 +25,11 @@
 ###########################################################################
 # Usage check.
 if [[ $# < 1 ]]; then
-  printf "Usage: \n$> ./scripts/run_tox.sh <tox_environment> [<sdk_location> 
[<posargs> ...]]"
-  printf "\n\ttox_environment: [required] Tox environment to run the test 
in.\n"
-  printf "\n\tsdk_location: [optional] SDK tarball artifact location.\n"
-  printf "\n\tposarg: [optional] Any additional arguments will be passed as 
posargs to tox.\n"
-  exit 1
+ printf "Usage: \n$> ./scripts/run_tox.sh <tox_environment> [<sdk_location> 
[<posargs> ...]]"

Review Comment:
   nit: let's  use consistent formatting and keep two spaces



##########
sdks/python/scripts/run_tox.sh:
##########
@@ -53,18 +53,32 @@ if [[ "$JENKINS_HOME" != "" ]]; then
   export PY_COLORS=1
 fi
 
-if [[ ! -z $2 ]]; then
+# Determine if the second argument is SDK_LOCATION or posargs
+if [[ -f "$1" ]]; then  # Check if the argument corresponds to a file
   SDK_LOCATION="$1"
-  shift;
-  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg 
"$SDK_LOCATION" -- "$@"
-else
-  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+  shift
+fi
+
+# If SDK_LOCATION is identified and there are still arguments left, those are 
posargs.
+# If SDK_LOCATION is not identified, all remaining arguments are posargs.
+
+if [[ ! -z "$SDK_LOCATION" ]]; then
+  if [[ $# -gt 0 ]]; then  # There are posargs
+    tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg 
"$SDK_LOCATION" -- "$@"
+  else
+    tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg 
"$SDK_LOCATION"
+  fi
+else  # No SDK_LOCATION; all arguments are posargs
+  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" -- "$@"

Review Comment:
   What are the cases when we run tox without SDK location?



##########
sdks/python/test-suites/tox/common.gradle:
##########
@@ -42,5 +42,5 @@ project.tasks.register("preCommitPy${pythonVersionSuffix}") {
           dependsOn = ["testPy38Cython"]
       } else {
           dependsOn = ["testPy${pythonVersionSuffix}Cloud", 
"testPy${pythonVersionSuffix}Cython"]
-      }
+        }

Review Comment:
   unnecessary change



##########
sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/__init__.py:
##########
@@ -29,5 +29,3 @@
   from 
apache_beam.runners.dataflow.internal.clients.cloudbuild.cloudbuild_v1_messages 
import *
 except ImportError:
   pass
-

Review Comment:
   these are autogenerated clients, why do we need these changes?



##########
.github/workflows/beam_PreCommit_Python_Dataframes.yml:
##########
@@ -105,6 +105,7 @@ jobs:
           arguments: |
             -Pposargs=apache_beam/dataframe/ \
             -PpythonVersion=${{ matrix.python_version }} \
+            -PuseWheelDistribution

Review Comment:
   for my education why were these flags necessary?



##########
sdks/python/apache_beam/runners/common.py:
##########
@@ -762,6 +762,7 @@ def __init__(self,
     # Try to prepare all the arguments that can just be filled in
     # without any additional work. in the process function.
     # Also cache all the placeholders needed in the process function.
+    input_args = list(input_args)

Review Comment:
   why was this necessary?



##########
sdks/python/apache_beam/examples/kafkataxi/README.md:
##########
@@ -157,9 +157,9 @@ Install Beam and dependencies and build a Beam distribution.
 
 ```sh
 cd beam/sdks/python
-pip install -r build-requirements.txt
 pip install -e '.[gcp]'
-python setup.py sdist
+pip install -q build

Review Comment:
   is having  `build` a prerequisite for an editable installation above?



##########
sdks/python/apache_beam/ml/gcp/naturallanguageml_test.py:
##########
@@ -60,12 +60,15 @@ def test_document_source(self):
     self.assertFalse('content' in dict_)
     self.assertTrue('gcs_content_uri' in dict_)
 
+  @unittest.skip(
+      'TypeError: Expected bytes, got MagicMock.'
+      'Please look at https://github.com/apache/beam/issues/28307.')

Review Comment:
   i'd like to understand the failure mode here.



##########
sdks/python/apache_beam/io/gcp/bigquery_test.py:
##########
@@ -103,6 +102,51 @@ def _load_or_default(filename):
     return {}
 
 
+_DESTINATION_ELEMENT_PAIRS = [

Review Comment:
   why was this necessary?



##########
sdks/python/setup.py:
##########
@@ -155,12 +156,18 @@ def cythonize(*args, **kwargs):
 # We must generate protos after setup_requires are installed.
 def generate_protos_first():
   try:
-    # pylint: disable=wrong-import-position
-    import gen_protos
-    gen_protos.generate_proto_files()
-
-  except ImportError:
-    warnings.warn("Could not import gen_protos, skipping proto generation.")
+    # Pyproject toml build happens in isolated environemnts. In those envs,
+    # gen_protos is unable to get imported. so we run a subprocess call.
+    cwd = os.path.abspath(os.path.dirname(__file__))
+    p = subprocess.call([
+      sys.executable,
+      os.path.join(cwd, 'gen_protos.py'),
+      '--no-force'
+    ])
+    if p != 0:
+      raise RuntimeError()
+  except RuntimeError:
+    logging.warning('Could not import gen_protos, skipping proto generation.')

Review Comment:
   is this why you assume installation is done from wheels in other places?



##########
sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers_test.py:
##########
@@ -20,16 +20,16 @@
 
 import hamcrest as hc
 
-import apache_beam.runners.dataflow.internal.clients.dataflow as dataflow
 from apache_beam.internal.gcp.json_value import to_json_value
+from apache_beam.runners.dataflow.internal.clients import dataflow
 from apache_beam.runners.dataflow.internal.clients.dataflow import 
message_matchers
 
 # Protect against environments where apitools library is not available.
 # pylint: disable=wrong-import-order, wrong-import-position
 try:
   from apitools.base.py import base_api
 except ImportError:
-  base_api = None
+  raise unittest.SkipTest('GCP dependencies are not installed')

Review Comment:
   why was this necessary? it's better to split any cleanup in a separate PR.



##########
sdks/python/gen_protos.py:
##########
@@ -511,24 +472,23 @@ def generate_proto_files(force=False):
   if not os.path.exists(PYTHON_OUTPUT_PATH):
     os.mkdir(PYTHON_OUTPUT_PATH)
 
-  grpcio_install_loc = ensure_grpcio_exists()
   protoc_gen_mypy = _find_protoc_gen_mypy()
-  with PythonPath(grpcio_install_loc):
+  try:

Review Comment:
   Can you provide context on grpcio changes?



##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -775,10 +775,12 @@ def _build_setup_package(setup_file,  # type: str
       if build_setup_args is None:
         build_setup_args = [
             Stager._get_python_executable(),
-            os.path.basename(setup_file),
-            'sdist',
-            '--dist-dir',
-            temp_dir
+            '-m',
+            'build',

Review Comment:
   this may be a breaking change. `build` might not be installed  in user's 
env. We can fallback to prior behavior in this case and maybe recommend build 
in a warning.



##########
sdks/python/gen_protos.py:
##########
@@ -539,39 +499,51 @@ def generate_proto_files(force=False):
           'Protoc returned non-zero status (see logs for details): '
           '%s' % ret_code)
 
-  # copy resource files
-  for path in MODEL_RESOURCES:
-    shutil.copy2(os.path.join(PROJECT_ROOT, path), PYTHON_OUTPUT_PATH)
-
-  proto_packages = set()
-  # see: https://github.com/protocolbuffers/protobuf/issues/1491
-  # force relative import paths for proto files
-  compiled_import_re = re.compile('^from (.*) import (.*)$')
-  for file_path in find_by_ext(PYTHON_OUTPUT_PATH,
-                               ('_pb2.py', '_pb2_grpc.py', '_pb2.pyi')):
-    proto_packages.add(os.path.dirname(file_path))
-    lines = []
-    with open(file_path, encoding='utf-8') as f:
-      for line in f:
-        match_obj = compiled_import_re.match(line)
-        if match_obj and \
-                match_obj.group(1).startswith('org.apache.beam.model'):
-          new_import = build_relative_import(
-              PYTHON_OUTPUT_PATH, match_obj.group(1), file_path)
-          line = 'from %s import %s\n' % (new_import, match_obj.group(2))
-
-        lines.append(line)
-
-    with open(file_path, 'w') as f:
-      f.writelines(lines)
-
-  generate_init_files_lite(PYTHON_OUTPUT_PATH)
-  with PythonPath(grpcio_install_loc):
+    # copy resource files
+    for path in MODEL_RESOURCES:
+      shutil.copy2(os.path.join(PROJECT_ROOT, path), PYTHON_OUTPUT_PATH)
+
+    proto_packages = set()
+    # see: https://github.com/protocolbuffers/protobuf/issues/1491
+    # force relative import paths for proto files
+    compiled_import_re = re.compile('^from (.*) import (.*)$')
+    for file_path in find_by_ext(PYTHON_OUTPUT_PATH,
+                                ('_pb2.py', '_pb2_grpc.py', '_pb2.pyi')):
+      proto_packages.add(os.path.dirname(file_path))
+      lines = []
+      with open(file_path, encoding='utf-8') as f:
+        for line in f:
+          match_obj = compiled_import_re.match(line)
+          if match_obj and \
+                  match_obj.group(1).startswith('org.apache.beam.model'):
+            new_import = build_relative_import(
+                PYTHON_OUTPUT_PATH, match_obj.group(1), file_path)
+            line = 'from %s import %s\n' % (new_import, match_obj.group(2))
+
+          lines.append(line)
+
+      with open(file_path, 'w') as f:
+        f.writelines(lines)
+
+    generate_init_files_lite(PYTHON_OUTPUT_PATH)
     for proto_package in proto_packages:
       generate_urn_files(proto_package, PYTHON_OUTPUT_PATH)
 
-    generate_init_files_full(PYTHON_OUTPUT_PATH)
+      generate_init_files_full(PYTHON_OUTPUT_PATH)
+  except ImportError:
+    # this means the required _pb2 files are already generated.
+    pass
 
 
 if __name__ == '__main__':
-  generate_proto_files(force=True)
+  import argparse
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--no-force',

Review Comment:
   ```
   parser.add_argument('--no-force', dest='force', action='store_false')
   parser.set_defaults(force=True)
   ```



##########
sdks/python/setup.py:
##########
@@ -155,12 +156,18 @@ def cythonize(*args, **kwargs):
 # We must generate protos after setup_requires are installed.
 def generate_protos_first():
   try:
-    # pylint: disable=wrong-import-position
-    import gen_protos
-    gen_protos.generate_proto_files()
-
-  except ImportError:
-    warnings.warn("Could not import gen_protos, skipping proto generation.")
+    # Pyproject toml build happens in isolated environemnts. In those envs,
+    # gen_protos is unable to get imported. so we run a subprocess call.
+    cwd = os.path.abspath(os.path.dirname(__file__))
+    p = subprocess.call([
+      sys.executable,
+      os.path.join(cwd, 'gen_protos.py'),
+      '--no-force'

Review Comment:
   why no-force?



##########
sdks/python/setup.py:
##########
@@ -213,22 +242,9 @@ def get_portability_package_data():
               *get_portability_package_data()
           ]
       },
-      ext_modules=cythonize([
-          'apache_beam/**/*.pyx',
-          'apache_beam/coders/coder_impl.py',
-          'apache_beam/metrics/cells.py',
-          'apache_beam/metrics/execution.py',
-          'apache_beam/runners/common.py',
-          'apache_beam/runners/worker/logger.py',
-          'apache_beam/runners/worker/opcounters.py',
-          'apache_beam/runners/worker/operations.py',
-          'apache_beam/transforms/cy_combiners.py',
-          'apache_beam/transforms/stats.py',
-          'apache_beam/utils/counters.py',
-          'apache_beam/utils/windowed_value.py',
-      ],
-                            language_level=3),
+      ext_modules=extensions,
       install_requires=[
+          'build>=0.9.0,<0.11.0',

Review Comment:
   interesting. is it common to add build as a dependency ? Also why not 1.x ?



##########
sdks/python/scripts/run_tox.sh:
##########
@@ -53,18 +53,32 @@ if [[ "$JENKINS_HOME" != "" ]]; then
   export PY_COLORS=1
 fi
 
-if [[ ! -z $2 ]]; then
+# Determine if the second argument is SDK_LOCATION or posargs
+if [[ -f "$1" ]]; then  # Check if the argument corresponds to a file
   SDK_LOCATION="$1"
-  shift;
-  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg 
"$SDK_LOCATION" -- "$@"
-else
-  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+  shift
+fi
+
+# If SDK_LOCATION is identified and there are still arguments left, those are 
posargs.
+# If SDK_LOCATION is not identified, all remaining arguments are posargs.
+
+if [[ ! -z "$SDK_LOCATION" ]]; then
+  if [[ $# -gt 0 ]]; then  # There are posargs
+    tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg 
"$SDK_LOCATION" -- "$@"

Review Comment:
   do we need -vv here and below?



##########
sdks/python/setup.py:
##########
@@ -187,7 +194,29 @@ def get_portability_package_data():
   # In order to find the tree of proto packages, the directory
   # structure must exist before the call to setuptools.find_packages()
   # executes below.
+  # while True: print(os.listdir())
   generate_protos_first()
+
+  # generate cythonize extensions only if we are building a wheel or
+  # building an extension or running in editable mode.

Review Comment:
   is  '-e . ' an  editable_wheel mode or a different one?



##########
sdks/python/scripts/run_tox.sh:
##########
@@ -53,18 +53,32 @@ if [[ "$JENKINS_HOME" != "" ]]; then
   export PY_COLORS=1
 fi
 
-if [[ ! -z $2 ]]; then
+# Determine if the second argument is SDK_LOCATION or posargs
+if [[ -f "$1" ]]; then  # Check if the argument corresponds to a file
   SDK_LOCATION="$1"
-  shift;
-  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg 
"$SDK_LOCATION" -- "$@"
-else
-  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+  shift
+fi
+
+# If SDK_LOCATION is identified and there are still arguments left, those are 
posargs.
+# If SDK_LOCATION is not identified, all remaining arguments are posargs.
+
+if [[ ! -z "$SDK_LOCATION" ]]; then
+  if [[ $# -gt 0 ]]; then  # There are posargs
+    tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg 
"$SDK_LOCATION" -- "$@"
+  else
+    tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg 
"$SDK_LOCATION"
+  fi
+else  # No SDK_LOCATION; all arguments are posargs
+  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" -- "$@"
 fi
 
 exit_code=$?
 # Retry once for the specific exit code 245.
 if [[ $exit_code == 245 ]]; then
-  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+  tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg 
"$SDK_LOCATION"
   exit_code=$?
 fi
 exit $exit_code
+This code first captures TOX_ENVIRONMENT and then shifts the arguments. It 
then checks the number of remaining arguments to decide how to proceed. If 
there are no remaining arguments, it assumes only the environment was provided. 
If there's one remaining argument, it assumes that's the SDK location. If there 
are two or more arguments, it takes the next one as the SDK location and passes 
all the remaining ones as positional arguments to the tox command.

Review Comment:
   misplaced content



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -42,10 +42,10 @@ echo "posargs: $posargs"
 
 # Run with pytest-xdist and without.
 pytest -o junit_suite_name=${envname} \
-  --junitxml=pytest_${envname}.xml -m 'not no_xdist' -n 6 ${pytest_args} 
--pyargs ${posargs}
+  --junitxml=pytest_${envname}.xml -m 'not no_xdist' -n 6 
--import-mode=importlib ${pytest_args} --pyargs ${posargs}

Review Comment:
   do we need to do anything for test files with the same name (we have quite a 
bit of `util_test.py`)



##########
sdks/python/container/Dockerfile:
##########
@@ -45,7 +45,7 @@ RUN  \
        && \
     rm -rf /var/lib/apt/lists/* && \
 
-    pip install --upgrade setuptools && \
+    pip install --upgrade setuptools && pip install --upgrade wheel && \

Review Comment:
   nit: you can combine both commands



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to