tvalentyn commented on code in PR #28385:
URL: https://github.com/apache/beam/pull/28385#discussion_r1342869531
##########
sdks/python/setup.py:
##########
@@ -155,12 +156,18 @@ def cythonize(*args, **kwargs):
# We must generate protos after setup_requires are installed.
def generate_protos_first():
try:
- # pylint: disable=wrong-import-position
- import gen_protos
- gen_protos.generate_proto_files()
-
- except ImportError:
- warnings.warn("Could not import gen_protos, skipping proto generation.")
+ # Pyproject toml build happens in isolated environemnts. In those envs,
+ # gen_protos is unable to get imported. so we run a subprocess call.
+ cwd = os.path.abspath(os.path.dirname(__file__))
+ p = subprocess.call([
+ sys.executable,
+ os.path.join(cwd, 'gen_protos.py'),
+ '--no-force'
+ ])
+ if p != 0:
+ raise RuntimeError()
+ except RuntimeError:
+ logging.warning('Could not import gen_protos, skipping proto generation.')
Review Comment:
Please correct this warning message. Also, will the logs from subprocess be
printed?
##########
sdks/python/pyproject.toml:
##########
@@ -0,0 +1,36 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# since we rely on setuptools and according to
https://peps.python.org/pep-0518/#build-system-table
+# this is the minimum requirements for the build system to execute.
+[build-system]
+requires = [
+ "setuptools",
+ "wheel>=0.36.0",
+ "grpcio-tools==1.53.0",
+ "mypy-protobuf==3.5.0",
+ # Avoid https://github.com/pypa/virtualenv/issues/2006
+ "distlib==0.3.7",
+ # Numpy headers
+ "numpy>=1.14.3,<1.26",
Review Comment:
add a comment to update together with setup.py
##########
sdks/python/setup.py:
##########
@@ -250,7 +264,7 @@ def get_portability_package_data():
'js2py>=0.74,<1',
# numpy can have breaking changes in minor versions.
# Use a strict upper bound.
- 'numpy>=1.14.3,<1.25.0', # Update build-requirements.txt as well.
+ 'numpy>=1.14.3,<1.25.0', # Update pyproject.toml as well.
Review Comment:
nit: let's downgrade TOML bound for consistency. We can update numpy in a
separate PR to not add more issues.
##########
sdks/python/apache_beam/io/gcp/shared_test_variables.py:
##########
@@ -0,0 +1,63 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Review Comment:
This file looks a bit awkward here.. also looking closer, the stub data in
`bigquery_file_loads_test` is now also split across two files, which looks
awkward too. A file like this would be better placed in /tests subdir in this
folder, and could include all the necessary shared BQ stubs data & schema in
one place.
Alternatively, we can reverse the course of my previous suggestion and
redefine `_ELEMENTS` in `sdks/python/apache_beam/io/gcp/bigquery_test.py`.
Since originally we have `_ELEMENTS = [elm[1] for elm in
_DESTINATION_ELEMENT_PAIRS]` to avoid extra copypasting, it can be redefined
here as sth like: `[{'name': 'beam', 'language': 'py'}, {'name': 'beam',
'language': 'java'}, {'name': 'beam', 'language': 'go'}, {'name': 'flink',
'language': 'java'}, {'name': 'flink', 'language': 'scala'}, {'name': 'spark',
'language': 'scala'}, {'name': 'spark', 'language': 'py'}, {'name': 'spark',
'language': 'scala'}, {'name': 'beam', 'foundation': 'apache'}, {'name':
'flink', 'foundation': 'apache'}, {'name': 'spark', 'foundation': 'apache'}]`
##########
sdks/python/container/Dockerfile:
##########
@@ -45,7 +45,7 @@ RUN \
&& \
rm -rf /var/lib/apt/lists/* && \
- pip install --upgrade setuptools && \
+ pip install --upgrade setuptools wheel && \
Review Comment:
nit: while at it, might as well upgrade pip (all those are already in the
base image).
##########
sdks/python/gen_protos.py:
##########
@@ -539,39 +499,44 @@ def generate_proto_files(force=False):
'Protoc returned non-zero status (see logs for details): '
'%s' % ret_code)
- # copy resource files
- for path in MODEL_RESOURCES:
- shutil.copy2(os.path.join(PROJECT_ROOT, path), PYTHON_OUTPUT_PATH)
-
- proto_packages = set()
- # see: https://github.com/protocolbuffers/protobuf/issues/1491
- # force relative import paths for proto files
- compiled_import_re = re.compile('^from (.*) import (.*)$')
- for file_path in find_by_ext(PYTHON_OUTPUT_PATH,
- ('_pb2.py', '_pb2_grpc.py', '_pb2.pyi')):
- proto_packages.add(os.path.dirname(file_path))
- lines = []
- with open(file_path, encoding='utf-8') as f:
- for line in f:
- match_obj = compiled_import_re.match(line)
- if match_obj and \
- match_obj.group(1).startswith('org.apache.beam.model'):
- new_import = build_relative_import(
- PYTHON_OUTPUT_PATH, match_obj.group(1), file_path)
- line = 'from %s import %s\n' % (new_import, match_obj.group(2))
-
- lines.append(line)
-
- with open(file_path, 'w') as f:
- f.writelines(lines)
-
- generate_init_files_lite(PYTHON_OUTPUT_PATH)
- with PythonPath(grpcio_install_loc):
+ # copy resource files
+ for path in MODEL_RESOURCES:
+ shutil.copy2(os.path.join(PROJECT_ROOT, path), PYTHON_OUTPUT_PATH)
+
+ proto_packages = set()
+ # see: https://github.com/protocolbuffers/protobuf/issues/1491
+ # force relative import paths for proto files
+ compiled_import_re = re.compile('^from (.*) import (.*)$')
+ for file_path in find_by_ext(PYTHON_OUTPUT_PATH,
+ ('_pb2.py', '_pb2_grpc.py', '_pb2.pyi')):
+ proto_packages.add(os.path.dirname(file_path))
+ lines = []
+ with open(file_path, encoding='utf-8') as f:
+ for line in f:
+ match_obj = compiled_import_re.match(line)
+ if match_obj and \
+ match_obj.group(1).startswith('org.apache.beam.model'):
+ new_import = build_relative_import(
+ PYTHON_OUTPUT_PATH, match_obj.group(1), file_path)
+ line = 'from %s import %s\n' % (new_import, match_obj.group(2))
+
+ lines.append(line)
+
+ with open(file_path, 'w') as f:
+ f.writelines(lines)
+
+ generate_init_files_lite(PYTHON_OUTPUT_PATH)
for proto_package in proto_packages:
generate_urn_files(proto_package, PYTHON_OUTPUT_PATH)
- generate_init_files_full(PYTHON_OUTPUT_PATH)
+ generate_init_files_full(PYTHON_OUTPUT_PATH)
+ except ImportError:
Review Comment:
Interesting. Why `ImportError`? what call causes the import error? It sounds
like a missing dependency could also cause this error. It's not obvious why the
error would mean _pb2 files are already generated.
##########
sdks/python/scripts/run_tox.sh:
##########
@@ -53,18 +53,27 @@ if [[ "$JENKINS_HOME" != "" ]]; then
export PY_COLORS=1
fi
-if [[ ! -z $2 ]]; then
+# Determine if the second argument is SDK_LOCATION or posargs
+if [[ -f "$1" ]]; then # Check if the argument corresponds to a file
SDK_LOCATION="$1"
- shift;
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION" -- "$@"
-else
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+ shift
+fi
+
+# If SDK_LOCATION is identified and there are still arguments left, those are
posargs.
+if [[ ! -z "$SDK_LOCATION" ]]; then
+ if [[ $# -gt 0 ]]; then # There are posargs
+ tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION" -- "$@"
+ else
+ tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION"
+ fi
+else # No SDK_LOCATION; all arguments are posargs
+ tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" -- "$@"
fi
exit_code=$?
# Retry once for the specific exit code 245.
if [[ $exit_code == 245 ]]; then
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+ tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION"
Review Comment:
it sounds like we should retry the whole branching logic in line 62-70?
##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -771,13 +772,26 @@ def _build_setup_package(setup_file, # type: str
try:
os.chdir(os.path.dirname(setup_file))
if build_setup_args is None:
- build_setup_args = [
- Stager._get_python_executable(),
- os.path.basename(setup_file),
- 'sdist',
- '--dist-dir',
- temp_dir
- ]
+ # if build is installed in the user env, use it to
+ # build the sdist else fallback to legacy setup.py sdist call.
+ if importlib.util.find_spec('build'):
Review Comment:
Are you aware of any cases where the new process might fail to build but old
one would succeed? I suppose users would have a workaround to uninstall `build`.
##########
sdks/python/tox.ini:
##########
@@ -17,7 +17,7 @@
[tox]
# new environments will be excluded by default unless explicitly added to
envlist.
-envlist =
py38,py39,py310,py311,py38-{cloud,cython,docs,lint,mypy,cloudcoverage,dask},py39-{cloud,cython},py310-{cloud,cython,dask},py311-{cloud,cython,dask},whitespacelint
Review Comment:
this actually removes a test suite that covers `no cloud dependencies`
variant.
A while ago we had plain suite, gcp suite , and cython suite. Later we said,
why don't we get rid of one set of tests by having: `cython` + `no cloud` as
one suite, and cloud as another suite.
Loss of coverage would impact the scenario when one installs:
pip install apache-beam (without cloud/gcp extras). One could argue - should
we even have this test suite or codepath? but that would be a separate
conversation. For now, we could perhaps rename the `cython` suite into
`noextras`.
##########
sdks/python/container/base_image_requirements_manual.txt:
##########
@@ -43,3 +43,4 @@ nose==1.3.7 # For Dataflow internal testing. TODO: remove
this.
python-snappy;python_version<"3.11" # Optimizes execution of some Beam
codepaths.
scipy
scikit-learn
+build>=1.0,<2 # tool to build sdist from setup.py in stager.
Review Comment:
(no action needed) this will help template image users.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]