[
https://issues.apache.org/jira/browse/AVRO-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398756#comment-15398756
]
Yibing Shi commented on AVRO-1817:
----------------------------------
[~busbey], I am not sure whether this task is feasible, especially when binary
encoder/writer is used.
AFAICS, in {{GenericDatumWriter}}, an enum value is written as its offset in
schema enum symbols list.
{code}
protected void writeEnum(Schema schema, Object datum, Encoder out)
throws IOException {
if (!data.isEnum(datum))
throw new AvroTypeException("Not an enum: "+datum);
out.writeEnum(schema.getEnumOrdinal(datum.toString()));
}
{code}
If {{BinaryEncoder}} is used, this offset is write *through* to the data file,
without any flags added to it.
{code}
public void writeEnum(int e) throws IOException {
this.writeInt(e);
}
{code}
In datum reader and decoder, it is very hard, if not impossible, to figure out
whether the data to read is actually an enum or an actual string. Things can be
even more complicated if unicode string is considered.
> Allow enums to be "promoted" to strings
> ---------------------------------------
>
> Key: AVRO-1817
> URL: https://issues.apache.org/jira/browse/AVRO-1817
> Project: Avro
> Issue Type: Improvement
> Components: java, spec
> Reporter: Michael Overmeyer
> Priority: Minor
>
> We should consider adding a resolution rule that can promote an enum to a
> string using the enum's symbol.
> I have an Avro schema that has a field with an enum type. However, I have
> realized that an enum is not the type I actually wanted. I would much rather
> have the type of the field be a string. I went to change this, but of course
> this type of change (enum -> string) is not within the bounds of Avro's
> schema evolution. Therefore a reader with this changed schema is not be able
> to read an object written with the old schema.
> For example, if the writer schema was:
> enum Colour {
> RED, YELLOW, GREEN
> }
> protocol stoplight {
> Colour colour;
> }
> And the reader schema was:
> protocol stoplight {
> string colour;
> }
> Then when you access the colour field of your object, you get the string
> representation of the enum value's symbol .
> For example, Colour.RED => "RED", Colour.YELLOW => "YELLOW", Colour.GREEN =>
> "GREEN"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)