Abacn commented on code in PR #25970:
URL: https://github.com/apache/beam/pull/25970#discussion_r1153827229
##########
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy:
##########
@@ -2785,15 +2785,70 @@ class BeamModulePlugin implements Plugin<Project> {
// distribution tarball generated by :sdks:python:sdist.
project.configurations { distTarBall }
+ // Create a task to install Beam python SDK locally before running tests
on
+ // Google cloud platform. The task depends on ':sdks:python:sdist'
project
+ // for the tarball file.
+ // For compatible systems (Linux amd64), a 'sdistReuseWheel' system
property
+ // can be provided to re-use the wheel built in this task (by configuring
+ // `--sdk_location` pipeine option in test tasks).
def installGcpTest = project.tasks.register('installGcpTest') {
dependsOn setupVirtualenv
dependsOn ':sdks:python:sdist'
+
+ // if host system wheel compatible with dataflow runner
+ def compatibleWithDataflow =
("amd64".equalsIgnoreCase(System.getProperty("os.arch")) &&
+ "linux".equalsIgnoreCase(System.getProperty("os.name")))
+ if (!compatibleWithDataflow && project.hasProperty('sdistReuseWheel'))
{
+ throw new GradleException('-PsdistReuseWheel is set for the task but
the ' +
+ 'host system is not compatible with Dataflow worker container
image.')
+ }
+
+ // Set sdistFiles project ext at execution time as the path to the
+ // generated installable Python SDK package. If project property
+ // sdistUseWheel is set, targets to wheel, otherwise a tarball.
+ project.ext.sdistFiles = project.files()
+
doLast {
- def distTarBall = "${pythonRootDir}/build/apache-beam.tar.gz"
+ def packageFile = "${pythonRootDir}/build/apache-beam.tar.gz"
+ if (compatibleWithDataflow) {
+ // build wheel in separate folder to avoid racing conditions
+ project.copy {
+ from project.tarTree(project.resources.gzip(packageFile))
+ into project.buildDir
+ }
+ def srcDirs = project.files()
+ project.buildDir.eachDirMatch({it.startsWith('apache-beam')},
{srcDirs.from it})
+ def srcDir = srcDirs.singleFile
+ project.exec {
+ executable 'sh'
+ args '-c', ". ${project.ext.envdir}/bin/activate && pip install
'Cython<1' " +
+ "&& cd ${srcDir} && python setup.py -q sdist bdist_wheel
--dist-dir ${project.buildDir}"
+ }
+ def collection = project.fileTree(project.buildDir){
+ include "**/*${project.ext.pythonVersion.replace('.', '')}*.whl"
+ exclude 'srcs/**'
+ }
+ def packageFilename = collection.singleFile.getName()
+ // rename to the suffix accepted by sdks/python/container/boot.go.
+ def renamed = packageFilename.replace(
Review Comment:
yeah, current approach is kind of hack. manylinux image is built using
cibuildwheel, which involves setting up a docker container to build it.
Currently our GitHub Action is doing this:
https://github.com/apache/beam/blob/004c7f612a57112ceb56aa04de318244ccd61f0e/.github/workflows/build_wheels.yml#L272
if migrated to GitHub Action this would be. If we have migrated to GitHub
Action this will be convenient.
For now, it will need to create a new gradle task to build manylinux (and
persumably others) (instead of in-place change to intallGcpTest). Probably
create a gradle task under ":sdks:python" -- this may also help users who build
custom SDK so they can easily have wheels instead of a tarball that takes long
time to install on Dataflow. Will do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]