James Clarke created AVRO-3101:
----------------------------------
Summary: Primitive number values are silently truncated in Java
GenericDatumWriter
Key: AVRO-3101
URL: https://issues.apache.org/jira/browse/AVRO-3101
Project: Apache Avro
Issue Type: Bug
Components: java
Affects Versions: 1.10.2, 1.10.1, 1.10.0
Reporter: James Clarke
Primitive java numeric types are silently truncated in GenericDatumWriter.
Previously (1.9.2) a Type.LONG field with a double value set would cause a
ClassCastException when serializing the datum.
Changes in AVRO-2070 cause a double value to be silently truncated.
I don't know if this is a bug or expected behavior since in 1.9.2 (and way way
earlier) Type.INT would be silently truncated but other numerics would not.
My use-case involves users generating data which conforms to a dynamically
generated Avro schema. The current change provides type safety (for downstream
consumers) but does not maintain data integrity. From my POV it would be better
to users to explicitly error with a ClassCastException than to introduce
corrupt data.
Example test case, which throws ClassCastException in 1.9.2 and prints 456 (not
the value set) in 1.10.2.
{code:java}
@Test
fun testWritingDoubleToLong() {
val longType = Schema.create(Schema.Type.LONG)
val field = Schema.Field("long", longType)
val fields = listOf(field)
val schema = Schema.createRecord("test", "doc", "", false, fields)
val record: GenericRecord = GenericData.Record(schema)
record.put("long", 456.4)
val stream = ByteArrayOutputStream()
val datumWriter: DatumWriter<GenericRecord> = GenericDatumWriter(schema)
val encoder = EncoderFactory.get().binaryEncoder(stream, null)
datumWriter.write(record, encoder)
encoder.flush()
val decoder = DecoderFactory.get().binaryDecoder(stream.toByteArray(), null)
val datumReader: DatumReader<GenericRecord> = GenericDatumReader(schema)
val output = datumReader.read(null, decoder)
println(output["long"])
}{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)