I have been thinking about more advanced type promotion in Avro after
facing more complicated schema evolution issues.  I think that we need to
draw the line of what is in the 'basic' promotion concept versus more
advanced things that need metadata decoration.  We recently added aliases,
which are an example of schema evolution that requires some metadata.

Int > String is one that has options.  Decimal? Hex? Etc.  Therefore it is
a candidate for something different than the intrinsic promotion. In some
sense, it is not type promotion at all, but type conversion.  One can say
that a float was promoted to a double, and that the opposite move is a
demotion.  There is only one way that each direction is handled.

Int to string and back, which direction is promotion?  Neither, it is a
conversion with multiple ways to go each direction.

More advanced schema transformations that I have faced in real schema
evolution are:

1. Nesting groups of fields into a record.
2. "flattening" fields from a record into a container record.
3. Breaking a union into components:  [A, B, C]  -->  [Null, A], [Null,B],
[Null,C]
4. Converting a union into an array of options:  [A, B, C] --> Array([A
,B, C])

#3 is needed when data must go to a system that does not support unions.
The client may still enforce that only one of the three exists, and a
fourth field indicating which is active may be added.
#4 happens when your data model changes and you now want multiple of a
branch, or concurrent existence of branches.  It is painful to write
client code to adapt but could be handled by advanced schema
transformation in Avro.

None of these are simple and they often require additional information
from the user to achieve.

In Avro Java API language, we have a ResolvingDecoder that handles all the
basic schema reader/writer evolution and promotion.  A new
'TransformingDecoder' could be supplied with more advanced type
transformation options.  Each type of transformation would need to be well
defined.  If it was a general Avro tool and not only Java, it would
require additions to the spec.




On 2/3/11 7:04 PM, "John Kristian" <[email protected]> wrote:

>Have you thought about extending schema resolution, so that an int or
>long can be promoted to a string?  The string would be the ASCII decimal
>representation of the number, I expect. Similarly, an enum could be
>promoted to its symbol (as a string).
>
>I¹ve seen this sort of thing used for evolving a schema: you start out
>thinking a number is all you need, and then discover you need a richer
>format.  Or vice-versa.
>
>JavaScript and some other languages do this, and people mostly like it.
>They also do other conversions that I don¹t suggest for Avro, such as
>string to number and float to string.  (The string representation of a
>float depends on your locale.)
>
>I¹d be happy to put this into JIRA, if you think that¹s appropriate.
>
>- John Kristian

Reply via email to