Hi everyone, I wanted to start a thread of discussion around the current Decimal logical type and the weaknesses in the current implementations, (well, the Java one at least, as that's what I'm using).
There are a couple of Jiras already around this subject: - AVRO-2078 Avro does not enforce schema resolution rules for Decimal type <https://issues.apache.org/jira/browse/AVRO-2078> and associated PR <https://github.com/apache/avro/pull/247> - AVRO-1721 Should LogicalTypes introduce schema (in)compatibility and canonical parsing form changes? <https://issues.apache.org/jira/browse/AVRO-1721> - AVRO-2164 Make Decimal a first class type. <https://issues.apache.org/jira/browse/AVRO-2164> *Avoiding data corruption with Decimal types.* Basically, the main issue seems to be that deserialising a record, where the read and write schemas have decimal types with different scale, leads to data corruption. e.g. given the decimal '12.34' that is serialised with a write schema that has scale 2, when the decimal is deserialised with a read schema that has scale 3, the resulting decimal is '1.234'). Currently, the Java implementation, (and likely others), does not see read and write schemas with different scale as being incompatible. The decimal *logical* type is not deemed part of the normalised form of the schema and is ignored. Schemas evolve over time. Where a scale of 2 may initial be deemed enough, it may come to pass that a higher, or lower, scale is needed going forward. There will always be cases where systems need to read data written with different schema versions using a consistent read-schema version. Therefore, it is important that data written with one schema version must be readable with a later, compatible, schema version. This, IMHO, should include the decimal logical type. If it does not, then the type becomes useless. One possible solution to this is to convert values on the fly, i.e. always deserialise a decimal using the write schema's scale and then attempt to convert it to the read schema's scale. When it comes to decimal, there are two possibilities for a change of scale: 1. Read schema's scale is greater than write's - such schemas should be seen as compatible. 2. Write schema's scale is greater than read's - The safest / strictest option here would be to see them as incompatible as there are decimals that can be serialised by the write schema that can't be deserialised by the read schema without rounding. A more lenient approach would be to allow the user to supply a rounding mode in such cases. I've not mentioned the decimal's precision up until this point as changes in precision, from a Java stand point, don't cause any issues upon deserialisation. However, this may not be the case for other languages and, conceptually at least, I wonder if it should be allowed to deserialise a decimal with a lower precision that was used to serialise it? Precision doesn't actually seem to much affect, other than for checking if the fixed byte array is large enough. *Logical types in normalised form* A secondary issue is whether the logical type should form part of the normalised form. My own experiences here suggest it should, but I'm less sure of this. Having written a 'Schema Store' and client libraries, I've found that, before logical types were available, it was important to normalise uploaded schemas to avoid creating many versions of a schema where nothing material changed, e.g. white space, doc changes etc. When clients required a schema to deserialize data, or to compile, the store would again return the normalised form and this worked well. With the introduction of logical types this had to change. I had to write my own normalisation code that included the logical types, as these are important when determining if a schema has changed and for ensuring the right schema is returned to clients. This suggests to me that logical types should be available in the normalised form, or at least that there should be an option to normalise to a form that includes normalised types. I'd love to hear what others think.... Thanks, Andy
