Polber commented on PR #30864:
URL: https://github.com/apache/beam/pull/30864#issuecomment-2040321400
@robertwb After further investigation, it appears this occurs anytime that 2
subsequent ExternalTransforms are declared that use the same gradle target, and
the first is given multiple inputs (PCollectionTuple). I haven't been able to
repro unless the input to the first ExternalTransform is a dict of PCollections.
Python example:
```
import apache_beam as beam
from apache_beam.transforms import external
import logging
from apache_beam.utils import subprocess_server
logging.getLogger().setLevel(logging.INFO)
with beam.Pipeline('DirectRunner') as p:
i1 = p | "i1" >> beam.Create([beam.Row(name='john', id=1)])
i2 = p | "i2" >> beam.Create([beam.Row(name='jane', id=1)])
result = {'i1': i1, 'i2': i2} | 'Sql1' >> external.ExternalTransform(
'beam:external:java:sql:v1',
external.ImplicitSchemaPayloadBuilder(
{'query': 'SELECT * FROM i1 INNER JOIN i2 ON i1.id = i2.id'}
).payload(),
external.JavaJarExpansionService(
subprocess_server.JavaJarServer.path_to_beam_jar(
gradle_target='sdks:java:extensions:sql:expansion-service:shadowJar',
artifact_id=None
)
)) | 'LogForTesting' >> external.SchemaAwareExternalTransform(
'beam:schematransform:org.apache.beam:yaml:log_for_testing:v1',
external.JavaJarExpansionService(
subprocess_server.JavaJarServer.path_to_beam_jar(
gradle_target='sdks:java:extensions:sql:expansion-service:shadowJar',
artifact_id=None
)
), rearrange_based_on_discovery=True)
```
YAML example:
```
pipeline:
transforms:
- type: Create
name: table1
config:
elements:
- name: "john"
id: 1
- type: Create
name: table2
config:
elements:
- name: "jane"
id: 1
- type: Sql
name: Join
input:
i1: table1
i2: table2
config:
query: "SELECT * FROM i1 INNER JOIN i2 ON i1.id = i2.id"
- type: LogForTestingJava
input: Sql
# Force same gradle target variant of LogForTesting
providers:
- type: 'beamJar'
config:
gradle_target: 'sdks:java:extensions:sql:expansion-service:shadowJar'
transforms:
LogForTestingJava:
'beam:schematransform:org.apache.beam:yaml:log_for_testing:v1'
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]