bvolpato opened a new issue, #29413:
URL: https://github.com/apache/beam/issues/29413

   ### What happened?
   
   After https://github.com/apache/beam/pull/27851, user code that depends on 
versions newer than Avro 1.8.2 are having problems running on Dataflow.
   
   For example in https://github.com/GoogleCloudPlatform/DataflowTemplates, 
where we moved on to Avro 1.11.3, there were incompatibility errors:
   
   > Caused by: java.io.InvalidClassException: 
org.apache.avro.specific.SpecificRecordBase; local class incompatible: stream 
classdesc serialVersionUID = -1463700717714793795, local class serialVersionUID 
= 189988654766568477
   
   and
   
   > Caused by: java.lang.NoSuchMethodError: 'boolean 
org.apache.avro.generic.GenericRecord.hasField(java.lang.String)'   
com.google.cloud.teleport.v2.transforms.FormatDatastreamRecordToJson.getMetadataIsDeleted(FormatDatastreamRecordToJson.java:258)
    
com.google.cloud.teleport.v2.transforms.FormatDatastreamRecordToJson.apply(FormatDatastreamRecordToJson.java:123)
    
com.google.cloud.teleport.v2.transforms.FormatDatastreamRecordToJson.apply(FormatDatastreamRecordToJson.java:51)
    
org.apache.beam.sdk.extensions.avro.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:610)
   
   The root cause is that Avro classes are now being shipped along with the 
`/opt/apache/beam/jars/beam-sdks-java-harness.jar`, which wasn't the case 
before.
   
   
   Tried to relocate in https://github.com/apache/beam/pull/29407 but got some 
test failures.
   
   Next step is marking Avro as provided in that JAR, since it's apparently not 
used.
   
   
   ### Issue Priority
   
   Priority: 1 (data loss / total loss of function)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to