[
https://issues.apache.org/jira/browse/AVRO-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963257#comment-13963257
]
Ryan Blue commented on AVRO-1402:
---------------------------------
No problem. Going from int to long or float to double is increasing the
precision, which is fine. But changing the scale is actually changing the data.
Say I have measurements coming in, and over time I'm updating the platform to
get measurements with higher resolution. Using BigDecimal is the right choice
because I want to be able to calculate the margin of error, so I need to know
how many figures are significant. If we fix the scale at the resolution of the
initial measurements, then the higher-resolution measurements are lost because
I have to discard digits to get to the same scale (12.008 becomes 12.01). But
if I start with a higher resolution, scale to 4 digits, then I have to store a
separate value that says how many of those are significant (12.0080 is really
12.008). In other words: for measurements, scale matters. That's why I'm not
using floating point because I don't want an approximation that is close, but
not quite accurate:
{code}
>> BigDecimal.new(12.100).to_s
=> "12.0999999999999996447286321199499070644378662109375"
{code}
If we were to evolve the schema from scale=2 to scale=4, how do I know which
values were accurate to 2 decimals and which were accurate to 4? If all
BigDecimal values produced by the new schema had the read-time scale but were
stored with different scales, then the file format would be changing data.
BigDecimal(12.01) != BigDecimal(12.0100). For evolution with different scales,
the maximum scale can increase, but we still have to return the scale the data
was written with. For fixed-scale schemas, I don't think we should allow the
scale to evolve because programs should expect objects with the fixed scale.
> Support for DECIMAL type
> ------------------------
>
> Key: AVRO-1402
> URL: https://issues.apache.org/jira/browse/AVRO-1402
> Project: Avro
> Issue Type: New Feature
> Affects Versions: 1.7.5
> Reporter: Mariano Dominguez
> Assignee: Tom White
> Priority: Minor
> Labels: Hive
> Fix For: 1.7.7
>
> Attachments: AVRO-1402.patch, AVRO-1402.patch, AVRO-1402.patch,
> AVRO-1402.patch, UnixEpochRecordMapping.patch
>
>
> Currently, Avro does not seem to support a DECIMAL type or equivalent.
> http://avro.apache.org/docs/1.7.5/spec.html#schema_primitive
> Adding DECIMAL support would be particularly interesting when converting
> types from Avro to Hive, since DECIMAL is already a supported data type in
> Hive (0.11.0).
--
This message was sent by Atlassian JIRA
(v6.2#6252)