[
https://issues.apache.org/jira/browse/AVRO-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Skraba updated AVRO-2952:
------------------------------
Fix Version/s: (was: 1.10.2)
1.11.0
> Logical Types and Conversions enhancements
> ------------------------------------------
>
> Key: AVRO-2952
> URL: https://issues.apache.org/jira/browse/AVRO-2952
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.10.1
> Reporter: Werner Daehn
> Priority: Critical
> Fix For: 1.11.0
>
>
> *Summary*:
> * Added a method *Field.getDataType()* which returns an object with common
> data type related methods. Most important are methods to call the converters.
> * Added trivial LogicalTypes to allow better database integration, e.g. the
> LogicalType VARCHAR(10) is a STRING that carries the information that only
> ASCII chars are in the payload and up to 10 chars max.
> *Example*:
> The user has a record with one ENUM field and he has a Java enum class.
> Instead of manually converting the Java string into a GenericEnumSymbol he
> can use the convertToRawType of the AvroDataType class.
> f...Field to be set
> {{testRecord.put(f.name(),
> f.getDataType().convertToRawType(myEnum.male.name()));}}
> Using the {{f.getDataType().convertToRawType() }}does all the conversion. I
> considered adding that conversion into the put() method itself but feared
> side effects. So the user has to invoke the convertToRawType().
> *Reasoning*:
> I am working with Avro (Kafka) for two years now and have implemented
> improvements around Logical Types. These I merged into the Avro code with
> zero side effects - pure additions. No breaking changes for other Avro users
> but a great help for them.
> Imagine you connect two databases via Kafka using Avro as the message payload.
> # The first problem you will be facing is that RawTypes and LogicalTypes are
> handled differently. For LogicalTypes there are conversion functions that
> provide metadata (e.g. getConvertedType returns that a Java Instant is the
> best data type for a timestamp-millis plus conversion logic. For raw types
> there is no such thing. A Boolean can be provided as true, "TRUE", 1,...
> # Second problem will be the lack of getObject()/setObject() methods similar
> to JDBC. The result are endless switch-case lists to call the correct
> methods. In every single project for every user.
> # Number three is the usage of the Converters as such. The intended usage is
> to add converters to the GenericData and the reader/writer uses the best
> suited converter. What I have seen most people do however is to use the
> converters manually and assign the raw value directly. While adding
> converters is possible still, the conversion at GenericRecord.put() and
> GenericRecord.get() is easy now.
> # For a data exchange format like Avro, it is important to carry as much
> metadata as possible. For example, purely seen from Avro a STRING data type
> is just fine. 99% of the string data types in a database are VARCHAR(length)
> and NVARCHAR(length). While putting an ASCII String of length 10 into a
> STRING is no problem, on the consumer side the only matching data type is a
> NCLOB - the worst for a data base. The LogicalTypes provide such nice methods
> to carry such metadata, e.g. a LogicalType VARCHAR(10) backed by a String.
> These Logical Types do not have any conversion, they just exist for the
> metadata. You have such a thing already with the UUID LogicalType.
>
> *Changes*:
> * A new package logicaltypes created. It includes all new LogicalTypes and
> the AvroDataType implementations for the various raw data types.
> * The existing LogicalTypes are unchanged. The corresponding classes in the
> logicaltype package just extend them.
> * For that some LogicalType fields needed to be made public.
> * The LogicalTypes return the more detailed logicaltype.* classes.
> * A test class created.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)