[ 
https://issues.apache.org/jira/browse/SQOOP-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263213#comment-14263213
 ] 

Veena Basavaraj edited comment on SQOOP-1616 at 1/2/15 8:22 PM:
----------------------------------------------------------------

noticed that in the Kite code, the decimal type in sqoop is treated as a string 
in avro
{code}
 case DECIMAL:

      // why string?

      return Schema.Type.STRING;
{code}


why not used FIXED type in avro?

Decimal
The decimal logical type represents an arbitrary-precision signed decimal 
number of the form unscaled × 10-scale.

A decimal logical type annotates Avro bytes or fixed types. The byte array must 
contain the two's-complement representation of the unscaled integer value in 
big-endian byte order. The scale is fixed, and is specified using an attribute.

The following attributes are supported:

scale, a JSON integer representing the scale (optional). If not specified the 
scale is 0.
precision, a JSON integer representing the (maximum) precision of decimals 
stored in this type (required).
For example, the following schema represents decimal numbers with a maximum 
precision of 4 and a scale of 2:

{code}

{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 4,
  "scale": 2
}

{code}

Regarding the FIXED POINT and FLOATING POINT, we need to check the byte size 
and then decide the type to be LONG

{code}
    case FIXED_POINT:

      return Schema.Type.LONG;

    case FLOATING_POINT:

      return Schema.Type.DOUBLE;

{code}
It should be something like this
{code}
case FIXED_POINT:

      if (((org.apache.sqoop.schema.type.FixedPoint) column).getByteSize() <= 
Integer.SIZE) {

        return Schema.Type.INT;

      } else {

        return Schema.Type.LONG;

      }

    case FLOATING_POINT:

      if (((org.apache.sqoop.schema.type.FloatingPoint) column).getByteSize() 
<= Float.SIZE) {

        return Schema.Type.FLOAT;

      } else {

        return Schema.Type.DOUBLE;

      }

{code}
SET should be treated as a ARRAY, hence it should be ARRAY type sin avro as 
well, currently it is treated as a enum, so not sure if this is right.

UNKNOWN is same as the BINARY/BYTES as far as sqoop is concerned.

also this code is really good, I had missed the UNION part when I coded avro 
IDF.

{code}
if (!column.getNullable()) {

      return Schema.create(type);

    } else {

      List<Schema> union = new ArrayList<Schema>();

      // really good call here

      union.add(Schema.create(type));

      union.add(Schema.create(Schema.Type.NULL));

      return Schema.createUnion(union);

    }
{code}


was (Author: vybs):
noticed that in the Kite code, the decimal type in sqoop is treated as a string 
in avro

 case DECIMAL:

      // why string?

      return Schema.Type.STRING;



why not used FIXED type in avro?

Decimal
The decimal logical type represents an arbitrary-precision signed decimal 
number of the form unscaled × 10-scale.

A decimal logical type annotates Avro bytes or fixed types. The byte array must 
contain the two's-complement representation of the unscaled integer value in 
big-endian byte order. The scale is fixed, and is specified using an attribute.

The following attributes are supported:

scale, a JSON integer representing the scale (optional). If not specified the 
scale is 0.
precision, a JSON integer representing the (maximum) precision of decimals 
stored in this type (required).
For example, the following schema represents decimal numbers with a maximum 
precision of 4 and a scale of 2:

{code}

{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 4,
  "scale": 2
}

{code}

Regarding the FIXED POINT and FLOATING POINT, we need to check the byte size 
and then decide the type to be LONG

    case FIXED_POINT:

      return Schema.Type.LONG;

    case FLOATING_POINT:

      return Schema.Type.DOUBLE;

It should be something like this
{code}
case FIXED_POINT:

      if (((org.apache.sqoop.schema.type.FixedPoint) column).getByteSize() <= 
Integer.SIZE) {

        return Schema.Type.INT;

      } else {

        return Schema.Type.LONG;

      }

    case FLOATING_POINT:

      if (((org.apache.sqoop.schema.type.FloatingPoint) column).getByteSize() 
<= Float.SIZE) {

        return Schema.Type.FLOAT;

      } else {

        return Schema.Type.DOUBLE;

      }

{code}
SET should be treated as a ARRAY, hence it should be ARRAY type sin avro as 
well, currently it is treated as a enum, so not sure if this is right.

UNKNOWN is same as the BINARY/BYTES as far as sqoop is concerned.

also this code is really good, I had missed the UNION part when I coded avro 
IDF.

{code}
if (!column.getNullable()) {

      return Schema.create(type);

    } else {

      List<Schema> union = new ArrayList<Schema>();

      // really good call here

      union.add(Schema.create(type));

      union.add(Schema.create(Schema.Type.NULL));

      return Schema.createUnion(union);

    }
{code}

> Sqoop2: Sqoop data type to Avro data type conversion
> ----------------------------------------------------
>
>                 Key: SQOOP-1616
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1616
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Veena Basavaraj
>            Priority: Minor
>             Fix For: 1.99.5
>
>
> Should add more data type convert support



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to