[
https://issues.apache.org/jira/browse/BEAM-10844?focusedWorklogId=488677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-488677
]
ASF GitHub Bot logged work on BEAM-10844:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 23/Sep/20 00:20
Start Date: 23/Sep/20 00:20
Worklog Time Spent: 10m
Work Description: y1chi commented on a change in pull request #12727:
URL: https://github.com/apache/beam/pull/12727#discussion_r493104158
##########
File path: sdks/python/apache_beam/runners/portability/stager.py
##########
@@ -136,6 +137,8 @@ def create_job_resources(options, # type: PipelineOptions
only for testing.
populate_requirements_cache: Callable for populating the requirements
cache. Used only for testing.
+ skip_boot_dependencies: Skip apache beam sdk, requirements, extra
+ packages, workflow tarball installs by sdk boot program.
Review comment:
sounds much better, thanks.
##########
File path: sdks/python/apache_beam/transforms/environments.py
##########
@@ -252,6 +254,13 @@ def from_runner_api_parameter(payload, capabilities,
artifacts, context):
@classmethod
def from_options(cls, options):
# type: (PipelineOptions) -> DockerEnvironment
+ if options.view_as(SetupOptions).prebuild_sdk_container_engine:
+ prebuilt_container_image = SdkContainerBuilder.build_container_image(
+ options)
+ return cls.from_container_image(
+ container_image=prebuilt_container_image,
+ artifacts=python_sdk_dependencies(
+ options, skip_boot_dependencies=True))
Review comment:
I think we still need it for python_sdk_dependencies to produce two
sets(complete set or the reduced set) of artifacts
##########
File path: sdks/python/container/boot.go
##########
@@ -203,15 +223,9 @@ func setupAcceptableWheelSpecs() error {
}
// installSetupPackages installs Beam SDK and user dependencies.
-func installSetupPackages(mds []*jobpb.ArtifactMetadata, workDir string) error
{
+func installSetupPackages(files []string, workDir string) error {
Review comment:
done.
##########
File path: sdks/python/container/boot.go
##########
@@ -30,18 +32,21 @@ import (
"time"
"github.com/apache/beam/sdks/go/pkg/beam/artifact"
- jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1"
pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1"
"github.com/apache/beam/sdks/go/pkg/beam/provision"
"github.com/apache/beam/sdks/go/pkg/beam/util/execx"
"github.com/apache/beam/sdks/go/pkg/beam/util/grpcx"
+ "github.com/golang/protobuf/jsonpb"
"github.com/golang/protobuf/proto"
"github.com/nightlyone/lockfile"
)
var (
acceptableWhlSpecs []string
+ setupOnly = flag.Bool("setup_only", false, "Execute boot program in
setup only mode (optional).")
Review comment:
done.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 488677)
Time Spent: 13h 20m (was: 13h 10m)
> Add a way to prebuild python sdk container with dependencies
> ------------------------------------------------------------
>
> Key: BEAM-10844
> URL: https://issues.apache.org/jira/browse/BEAM-10844
> Project: Beam
> Issue Type: New Feature
> Components: runner-dataflow
> Reporter: Yichi Zhang
> Assignee: Yichi Zhang
> Priority: P2
> Time Spent: 13h 20m
> Remaining Estimate: 0h
>
> We should add a way to prebuild python sdk container on top of latest public
> sdk image, and have all the dependencies installed, so that the setup steps
> won't need to be executed again every time a new worker vm is launched, on
> dataflow.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)