[ 
https://issues.apache.org/jira/browse/BEAM-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182202#comment-17182202
 ] 

Heejong Lee commented on BEAM-10776:
------------------------------------

IIRC, the dependencies except 
'beam-sdks-java-io-expansion-service-2.24.0-SNAPSHOT-' are not used in the 
pipeline. They are on the list of dependencies only because they're default 
(provided) jars on classpath. Is that correct?

It looks hard to automatically filter out unused dependencies since classes 
could be referenced from runtime reflection. Probably the safest bet would be 
getting a list of required dependencies from users.

> Unwanted JDK jars staged when running cross-language pipelines
> --------------------------------------------------------------
>
>                 Key: BEAM-10776
>                 URL: https://issues.apache.org/jira/browse/BEAM-10776
>             Project: Beam
>          Issue Type: Bug
>          Components: cross-language
>            Reporter: Chamikara Madhusanka Jayalath
>            Priority: P2
>
> When running cross-language Kafka on Dataflow I see following jars being 
> staged.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/nashorn-BJZNQ7N8Lsfq-WSM0IMsRCwFMC3RIxBOEjrlB1YwKOw.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/nashorn-BJZNQ7N8Lsfq-WSM0IMsRCwFMC3RIxBOEjrlB1YwKOw.jar
>  in 40 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/cldrdata-aZ6XIS6LfPilqVFbS_bWm1wMWGm3jxtjh0vjlRuqp5M.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/cldrdata-aZ6XIS6LfPilqVFbS_bWm1wMWGm3jxtjh0vjlRuqp5M.jar
>  in 177 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/jfxrt-B2UJQqvuEI-15FPV1mcdw80YRUIDMg1Kr82FxWK_DZ8.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/jfxrt-B2UJQqvuEI-15FPV1mcdw80YRUIDMg1Kr82FxWK_DZ8.jar
>  in 285 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/dnsns-zNxWyUaaHIkUFJRt-aNZudjc3eroySNUeRkxdxidGbY.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/dnsns-zNxWyUaaHIkUFJRt-aNZudjc3eroySNUeRkxdxidGbY.jar
>  in 0 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/localedata-Wt0bN9j6XmIH4BaRLouHZX6p6iIoQsbZ2AkomxZTOYM.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/localedata-Wt0bN9j6XmIH4BaRLouHZX6p6iIoQsbZ2AkomxZTOYM.jar
>  in 16 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/jaccess-5wlKULhaKWM_gmKVtH_QBwVqH4awlxxRdNNfz0z0Imw.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/jaccess-5wlKULhaKWM_gmKVtH_QBwVqH4awlxxRdNNfz0z0Imw.jar
>  in 0 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/MRJToolkit-jU5qhDBc0cNjn7g3yrGHYO78BRC09T-sE8Syqo9mRjg.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/MRJToolkit-jU5qhDBc0cNjn7g3yrGHYO78BRC09T-sE8Syqo9mRjg.jar
>  in 0 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to 
> gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/beam-sdks-java-io-expansion-service-2.24.0-SNAPSHOT-A94br32q87Prj7b_mG4_kPEdz9NSJ-0NwgHWEwwU4Qc.jar...
>  
> Out of these we just need 
> 'beam-sdks-java-io-expansion-service-2.24.0-SNAPSHOT-A94br32q87Prj7b_mG4_kPEdz9NSJ-0NwgHWEwwU4Qc.jar'.
>  Rest seems to be due to us including all jars from classpath in the 
> expansion service response.
>  
> [https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L407]
>  
> We should figure out a way to filter out these additional jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to