On Wed, Dec 11, 2013 at 9:37 AM, Pedro Larroy
<[email protected]>wrote:
> I think it would be good to have avro as a generic
> serialization format not only limited by jvm implementation details.
>
There are two issues here. One is how to represent such things in Avro
schemas and the other is how Avro schemas are mapped to programming
languages. The latter is much easier to alter compatibly.
In Avro 1.0, for maximal interoperability and simple implementation, we
sought to restrict schemas to types common to popular programming
languages. For example, we restricted map keys to strings since some
languages don't permit other types as map keys, and we didn't directly
support unsigned integers.
One way to add support for unsigned integers would be to add new primitive
types to Avro schemas. We could then map the new primitive type to
corresponding unsigned primitive types in C, C++ and C#, and perhaps to
java.math.BigInteger in Java. All implementations would need to be updated
to somehow implement the new primitive type.
However we cannot add new primitive types to Avro without breaking
compatibility. We can thus only consider adding such new primitive types
in Avro 2.0.
Another way to add support for unsigned integers in Avro would be to find a
way to represent these as an Avro 1.0 schema. For example, an unsigned
64-bit integer might be represented with the schema {"type":"fixed",
"size":8, "is":"uint"}. This optional extension could be defined in the
specification. We could then map this schema to the corresponding unsigned
primitive types in C, C++ and C#, and perhaps to java.math.BigInteger in
Java. Schemas that use unsigned values would look a little less natural
than if we add a new primitive type to Avro, but compatibility would be
maintained. Implementations could be updated incrementally to provide
better support for unsigned values.
With either approach we could also extend Avro IDL to better support
unsigned types.
Doug