[
https://issues.apache.org/jira/browse/SQOOP-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263213#comment-14263213
]
Veena Basavaraj edited comment on SQOOP-1616 at 1/2/15 8:22 PM:
----------------------------------------------------------------
noticed that in the Kite code, the decimal type in sqoop is treated as a string
in avro
{code}
case DECIMAL:
// why string?
return Schema.Type.STRING;
{code}
why not used FIXED type in avro?
Decimal
The decimal logical type represents an arbitrary-precision signed decimal
number of the form unscaled × 10-scale.
A decimal logical type annotates Avro bytes or fixed types. The byte array must
contain the two's-complement representation of the unscaled integer value in
big-endian byte order. The scale is fixed, and is specified using an attribute.
The following attributes are supported:
scale, a JSON integer representing the scale (optional). If not specified the
scale is 0.
precision, a JSON integer representing the (maximum) precision of decimals
stored in this type (required).
For example, the following schema represents decimal numbers with a maximum
precision of 4 and a scale of 2:
{code}
{
"type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 2
}
{code}
Regarding the FIXED POINT and FLOATING POINT, we need to check the byte size
and then decide the type to be LONG
{code}
case FIXED_POINT:
return Schema.Type.LONG;
case FLOATING_POINT:
return Schema.Type.DOUBLE;
{code}
It should be something like this
{code}
case FIXED_POINT:
if (((org.apache.sqoop.schema.type.FixedPoint) column).getByteSize() <=
Integer.SIZE) {
return Schema.Type.INT;
} else {
return Schema.Type.LONG;
}
case FLOATING_POINT:
if (((org.apache.sqoop.schema.type.FloatingPoint) column).getByteSize()
<= Float.SIZE) {
return Schema.Type.FLOAT;
} else {
return Schema.Type.DOUBLE;
}
{code}
SET should be treated as a ARRAY, hence it should be ARRAY type sin avro as
well, currently it is treated as a enum, so not sure if this is right.
UNKNOWN is same as the BINARY/BYTES as far as sqoop is concerned.
also this code is really good, I had missed the UNION part when I coded avro
IDF.
{code}
if (!column.getNullable()) {
return Schema.create(type);
} else {
List<Schema> union = new ArrayList<Schema>();
// really good call here
union.add(Schema.create(type));
union.add(Schema.create(Schema.Type.NULL));
return Schema.createUnion(union);
}
{code}
was (Author: vybs):
noticed that in the Kite code, the decimal type in sqoop is treated as a string
in avro
case DECIMAL:
// why string?
return Schema.Type.STRING;
why not used FIXED type in avro?
Decimal
The decimal logical type represents an arbitrary-precision signed decimal
number of the form unscaled × 10-scale.
A decimal logical type annotates Avro bytes or fixed types. The byte array must
contain the two's-complement representation of the unscaled integer value in
big-endian byte order. The scale is fixed, and is specified using an attribute.
The following attributes are supported:
scale, a JSON integer representing the scale (optional). If not specified the
scale is 0.
precision, a JSON integer representing the (maximum) precision of decimals
stored in this type (required).
For example, the following schema represents decimal numbers with a maximum
precision of 4 and a scale of 2:
{code}
{
"type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 2
}
{code}
Regarding the FIXED POINT and FLOATING POINT, we need to check the byte size
and then decide the type to be LONG
case FIXED_POINT:
return Schema.Type.LONG;
case FLOATING_POINT:
return Schema.Type.DOUBLE;
It should be something like this
{code}
case FIXED_POINT:
if (((org.apache.sqoop.schema.type.FixedPoint) column).getByteSize() <=
Integer.SIZE) {
return Schema.Type.INT;
} else {
return Schema.Type.LONG;
}
case FLOATING_POINT:
if (((org.apache.sqoop.schema.type.FloatingPoint) column).getByteSize()
<= Float.SIZE) {
return Schema.Type.FLOAT;
} else {
return Schema.Type.DOUBLE;
}
{code}
SET should be treated as a ARRAY, hence it should be ARRAY type sin avro as
well, currently it is treated as a enum, so not sure if this is right.
UNKNOWN is same as the BINARY/BYTES as far as sqoop is concerned.
also this code is really good, I had missed the UNION part when I coded avro
IDF.
{code}
if (!column.getNullable()) {
return Schema.create(type);
} else {
List<Schema> union = new ArrayList<Schema>();
// really good call here
union.add(Schema.create(type));
union.add(Schema.create(Schema.Type.NULL));
return Schema.createUnion(union);
}
{code}
> Sqoop2: Sqoop data type to Avro data type conversion
> ----------------------------------------------------
>
> Key: SQOOP-1616
> URL: https://issues.apache.org/jira/browse/SQOOP-1616
> Project: Sqoop
> Issue Type: Sub-task
> Components: connectors
> Reporter: Qian Xu
> Assignee: Veena Basavaraj
> Priority: Minor
> Fix For: 1.99.5
>
>
> Should add more data type convert support
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)