[
https://issues.apache.org/jira/browse/AVRO-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962954#comment-13962954
]
Tom White commented on AVRO-1402:
---------------------------------
There has been some further (offline) discussion about whether it would be
possible to store the scale in the Avro schema, and not in the data for
efficiency reasons. Something like:
{code}
{
"type":"record”,
"name":”org.apache.avro.FixedDecimal”,
"fields”: [{
"name":"value”,
"type":”bytes"
}],
"scale":"2”,
"precision":”4"
}
{code}
In the implementation committed here the name does not uniquely determine the
RecordMapping so a FixedDecimal(4, 2) has a different RecordMapping to a
FixedDecimal(3, 0). GenericData has a map of name to RecordMappings, so
org.apache.avro.FixedDecimal would map to either FixedDecimalRecordMapping(4,
2) or FixedDecimalRecordMapping(3, 0), but not both.
We could solve this problem by having a stateless FixedDecimalRecordMapping and
having the read and write methods pass through the record schema to get the
scale. However, consider the case where there are multiple decimals (with
different scales) in a single schema. Since you can’t redefine a type multiple
times (http://avro.apache.org/docs/1.7.6/spec.html#Names), the first one serves
as the definition, and later ones are just references:
{code}
{"type":"record","name":"rec","fields":[
{"name":"dec1","type":{"type":"record","name":"org.apache.avro.FixedDecimal","fields":[{"name":"value","type":"bytes"}],"scale":"2","precision":"4"}},
{"name":"dec2","type":"org.apache.avro.FixedDecimal","precision":"3","scale":"0"}
]}
{code}
When GenericDatumReader/Writer is processing dec2, the value of scale seen is
2, not 0, since the read/write method sees the record schema, not the
field-level schema. I can’t see a simple way around this.
Note that in the Decimal schema committed in this JIRA we allow maxPrecision
and maxScale values to be specified as JSON properties that are not interpreted
by Avro. E.g.
{code}
{"type":"record","name":"rec","fields":[
{"name":"dec1","type":{"type":"record","name":”org.apache.avro.Decimal","fields":[{"name":"scale","type":"int"},{"name":"value","type":"bytes"}],"maxPrecision":"4","maxScale":"2"}},
{"name":"dec2","type":"org.apache.avro.Decimal","maxPrecision":"3","maxScale":"0"}
]}
{code}
As it stands an application using this extra metadata would have to be careful
to read the JSON properties either from the field (if they are present there)
or the org.apache.avro.Decimal record type. This might be something we improve
- e.g. by only having the metadata as a field-level properties, not as a part
of the record definition. That would work for Hive.
> Support for DECIMAL type
> ------------------------
>
> Key: AVRO-1402
> URL: https://issues.apache.org/jira/browse/AVRO-1402
> Project: Avro
> Issue Type: New Feature
> Affects Versions: 1.7.5
> Reporter: Mariano Dominguez
> Assignee: Tom White
> Priority: Minor
> Labels: Hive
> Fix For: 1.7.7
>
> Attachments: AVRO-1402.patch, AVRO-1402.patch, AVRO-1402.patch,
> AVRO-1402.patch, UnixEpochRecordMapping.patch
>
>
> Currently, Avro does not seem to support a DECIMAL type or equivalent.
> http://avro.apache.org/docs/1.7.5/spec.html#schema_primitive
> Adding DECIMAL support would be particularly interesting when converting
> types from Avro to Hive, since DECIMAL is already a supported data type in
> Hive (0.11.0).
--
This message was sent by Atlassian JIRA
(v6.2#6252)