deepix opened a new pull request #17167: URL: https://github.com/apache/beam/pull/17167
### Summary A filesToStage arg when starting expansion service will let us customize and/or control what files can be staged. This speeds up pipeline execution in environments where we can pre-stage jars in the Java harness SDK. This is based on guidance by @chamikaramj in a [mailing list thread](https://lists.apache.org/thread/knkf6yn52z1fxzbcgkt22dv48o8055bh). ### Testing Added two unit tests. Also ran expansion service separately and a portable Python pipeline that uses Kafka I/O (hence Java expansion service). Pipeline ran successfully. ``` $ java -cp ./runners/flink/1.13/job-server/build/libs/beam-runners-flink-1.13-job-server-2.35.0-SNAPSHOT.jar org.apache.beam.sdk.expansion.service.ExpansionService 8096 --filesToStage="foo.jar" ... Mar 22, 2022 7:04:30 PM org.apache.beam.sdk.expansion.service.ExpansionService$TransformProvider getDependencies INFO: Staging to files from the classpath: 1, [foo.jar] ``` One can also prevent staging (useful when we have already pre-staged relevant jars in the Java harness SDK): ``` $ java -cp ./runners/flink/1.13/job-server/build/libs/beam-runners-flink-1.13-job-server-2.35.0-SNAPSHOT.jar org.apache.beam.sdk.expansion.service.ExpansionService 8096 --filesToStage= ... Mar 22, 2022 7:03:59 PM org.apache.beam.sdk.expansion.service.ExpansionService$TransformProvider getDependencies INFO: Staging to files from the classpath: 1, [] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
