deepix opened a new pull request #17167:
URL: https://github.com/apache/beam/pull/17167


   ### Summary
   
   A filesToStage arg when starting expansion service will let us customize 
and/or control what files can be staged. This speeds up pipeline execution in 
environments where we can pre-stage jars in the Java harness SDK.
   
   This is based on guidance by @chamikaramj in a [mailing list 
thread](https://lists.apache.org/thread/knkf6yn52z1fxzbcgkt22dv48o8055bh).
   
   ### Testing
   
   Added two unit tests.
   
   Also ran expansion service separately and a portable Python pipeline that 
uses Kafka I/O (hence Java expansion service). Pipeline ran successfully.
   
   ```
   $ java -cp 
./runners/flink/1.13/job-server/build/libs/beam-runners-flink-1.13-job-server-2.35.0-SNAPSHOT.jar
 org.apache.beam.sdk.expansion.service.ExpansionService 8096 
--filesToStage="foo.jar"
   ...
   Mar 22, 2022 7:04:30 PM 
org.apache.beam.sdk.expansion.service.ExpansionService$TransformProvider 
getDependencies
   INFO: Staging to files from the classpath: 1, [foo.jar]
   ```
   
   One can also prevent staging (useful when we have already pre-staged 
relevant jars in the Java harness SDK):
   
   ```
   $ java -cp 
./runners/flink/1.13/job-server/build/libs/beam-runners-flink-1.13-job-server-2.35.0-SNAPSHOT.jar
 org.apache.beam.sdk.expansion.service.ExpansionService 8096 --filesToStage=
   ...
   Mar 22, 2022 7:03:59 PM 
org.apache.beam.sdk.expansion.service.ExpansionService$TransformProvider 
getDependencies
   INFO: Staging to files from the classpath: 1, []


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to