[ 
https://issues.apache.org/jira/browse/AVRO-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013503#comment-14013503
 ] 

Michael Pershyn commented on AVRO-1500:
---------------------------------------

bq. The Thrift-generated files are checked-in so that every developer who runs 
tests need not have the Thrift compiler installed.
This sounds reasonable.

My initial though was to submit a Vagrant file with couple provisioning scripts 
to get thrift up and running on virtual machine. This would make regeneration 
of thrift with specific version of compiler (e.g. 0.7) easier. But, this will 
at the same time limit it to one platform (scripts are for linux only). Also, 
this only would be helpful when someone wants to recompile thrift, which is I 
believe rare occasion. 

So I have serious doubts on vagrant file, and now my proposal is just to add 
README file to the thrift projects, where to write, that
- The Thrift-generated files are checked-in so that every developer who runs 
tests need not have the Thrift compiler installed.
- For regeneration of thrift files make sure you have required version of 
thrift-compiler installed (0.7) and run {{mvn -Pthrift-generate 
generate-test-sources}}

> Unknown datum type exception during union type resolution (no short to int 
> conversion).
> ---------------------------------------------------------------------------------------
>
>                 Key: AVRO-1500
>                 URL: https://issues.apache.org/jira/browse/AVRO-1500
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.7.4, 1.7.5, 1.7.6
>         Environment: java API
>            Reporter: Michael Pershyn
>              Labels: easyfix, newbie, patch
>         Attachments: AVRO-1500-README.patch, AVRO-1500-ThriftData.patch, 
> AVRO-1500-unit-test.diff, AVRO-1500.patch
>
>
> There is a conversion for values of type {{short}} (and other numeric types) 
> if in the schema they are declared as int:
> {code:title=/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumWriter.java}
>  protected void write(Schema schema, Object datum, Encoder out)
>     throws IOException {
>     try {
>       switch (schema.getType()) {
>         // ...
>         case INT:     out.writeInt(((Number)datum).intValue()); break;
>         // ...
> {code}
> So, if a value of {{short}} type is passed to INT field, it will be converted 
> and saved as INT in avro.
> But, when there is next field in schema: 
> {code}
>  ["null",{"type":"int","thrift":"short"}] {code}
> which is a union with int in it, and short is passed in (lets say 5), then we 
> are having the next exception:
> {code}
> org.apache.avro.AvroRuntimeException: Unknown datum type: 5
>       at 
> org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:593)
>       at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:558)
>       at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:144)
>  
> {code}
> This happens because in {{org.apache.avro.generic.GenericData}} there is no 
> check if the passed object has a type of {{java.lang.Short}}, and it is not 
> get converted then in {{write}} method of {{GenericDatumWriter}}:
> {code:title=lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java}
> /** Return the schema full name for a datum.  Called by {@link
>    * #resolveUnion(Schema,Object)}. */
> protected String getSchemaName(Object datum) {
> /* ... */
> if (isInteger(datum))
>   return Type.INT.getName();
> if (isLong(datum))
>   return Type.LONG.getName();
>  if (isFloat(datum))
>  return Type.FLOAT.getName();
> if (isDouble(datum))
>   return Type.DOUBLE.getName();
>  if (isBoolean(datum))
>   return Type.BOOLEAN.getName();
> throw new AvroRuntimeException("Unknown datum type: "+datum);
> {code}
> This error initially occured during thrift to avro conversion, when thrift 
> obj had optional field of type i16.
> In thrift to avro schema converter, if the type is short in thrift (i16) it 
> will be implicitly converted to int in avro-schema, so the values should be 
> converted as well. This is already done if they are not in the union (not 
> optional). Otherwise the exception is thrown.
> The snippet from schema conversion code is below:
> {code:title=lang/java/thrift/src/main/java/org/apache/avro/thrift/ThriftData.java}
> private Schema getSchema(FieldValueMetaData f) {
>   switch (f.type) {
>   /* ... */ 
>     case TType.I16:
>       Schema s = Schema.create(Schema.Type.INT);
>       s.addProp(THRIFT_PROP, "short");
>     return s;
>    /* ... */
> {code}
> Proposal is to add isShort check to generic data, as well as isShort method 
> implementation:
> {code:title=lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java}
> protected String getSchemaName(Object datum) {
> // ..
> if (isInteger(datum) || isShort(datum))
>    return Type.INT.getName();
> // ..
> {code}
> or maybe even some kind of isNumeric method, so the behaviour will be same 
> for INT fields and INT fields that are in Union.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to