Abacn opened a new issue, #36735:
URL: https://github.com/apache/beam/issues/36735
### What happened?
- write int to target schema int64, succeed
- write int to target schema nullable(int64), using storage_write_api,
succeed
- write int to target schema nullable(int64), using file_load avro, failing
with
```
Caused by: org.apache.avro.UnresolvedUnionException: Not in union
["null","long"]: 123 (field=nullableLong)
```
A simple reproduce (not using Beam):
```
public class AvroTest {
private static final String SCHEMA_JSON = "{\n" +
" \"type\": \"record\",\n" +
" \"name\": \"UserEvent\",\n" +
" \"namespace\": \"com.example.avro\",\n" +
" \"fields\": [\n" +
" {\"name\": \"userId\", \"type\": \"string\"},\n" +
" {\"name\": \"nonNullLong\", \"type\": \"long\"},\n" +
" {\"name\": \"nullableLong\", \"type\": [\"null\", \"long\"],
\"default\": null}\n" +
" ]\n" +
"}";
public static void main(String[] argv) throws AvroRuntimeException,
IOException {
Schema schema = new Schema.Parser().parse(SCHEMA_JSON);
GenericRecord eventWithTimestamp = new GenericData.Record(schema);
eventWithTimestamp.put("userId", "user-123");
eventWithTimestamp.put("nonNullLong", 123);
eventWithTimestamp.put("nullableLong", 123); // fail
File avroOutputFile = new File("user-events.avro");
DatumWriter<GenericRecord> datumWriter = new
GenericDatumWriter<>(schema);
try (DataFileWriter<GenericRecord> dataFileWriter = new
DataFileWriter<>(datumWriter)) {
dataFileWriter.create(schema, avroOutputFile);
dataFileWriter.append(eventWithTimestamp);
}
}
}
```
this is a known avro issue:
https://stackoverflow.com/questions/35963285/org-apache-avro-unresolvedunionexception-not-in-union-long-null
However this led a breaking change for Beam Yaml 2.69.0 where it switched
the batch BigQueryIO write to storage_write_api to Managed IO (backed by
file_load).
### Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
### Issue Components
- [ ] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]