[ 
https://issues.apache.org/jira/browse/AVRO-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963257#comment-13963257
 ] 

Ryan Blue commented on AVRO-1402:
---------------------------------

No problem. Going from int to long or float to double is increasing the 
precision, which is fine. But changing the scale is actually changing the data.

Say I have measurements coming in, and over time I'm updating the platform to 
get measurements with higher resolution. Using BigDecimal is the right choice 
because I want to be able to calculate the margin of error, so I need to know 
how many figures are significant. If we fix the scale at the resolution of the 
initial measurements, then the higher-resolution measurements are lost because 
I have to discard digits to get to the same scale (12.008 becomes 12.01). But 
if I start with a higher resolution, scale to 4 digits, then I have to store a 
separate value that says how many of those are significant (12.0080 is really 
12.008). In other words: for measurements, scale matters. That's why I'm not 
using floating point because I don't want an approximation that is close, but 
not quite accurate:

{code}
>> BigDecimal.new(12.100).to_s
=> "12.0999999999999996447286321199499070644378662109375"
{code}

If we were to evolve the schema from scale=2 to scale=4, how do I know which 
values were accurate to 2 decimals and which were accurate to 4? If all 
BigDecimal values produced by the new schema had the read-time scale but were 
stored with different scales, then the file format would be changing data. 
BigDecimal(12.01) != BigDecimal(12.0100). For evolution with different scales, 
the maximum scale can increase, but we still have to return the scale the data 
was written with. For fixed-scale schemas, I don't think we should allow the 
scale to evolve because programs should expect objects with the fixed scale.

> Support for DECIMAL type
> ------------------------
>
>                 Key: AVRO-1402
>                 URL: https://issues.apache.org/jira/browse/AVRO-1402
>             Project: Avro
>          Issue Type: New Feature
>    Affects Versions: 1.7.5
>            Reporter: Mariano Dominguez
>            Assignee: Tom White
>            Priority: Minor
>              Labels: Hive
>             Fix For: 1.7.7
>
>         Attachments: AVRO-1402.patch, AVRO-1402.patch, AVRO-1402.patch, 
> AVRO-1402.patch, UnixEpochRecordMapping.patch
>
>
> Currently, Avro does not seem to support a DECIMAL type or equivalent.
> http://avro.apache.org/docs/1.7.5/spec.html#schema_primitive
> Adding DECIMAL support would be particularly interesting when converting 
> types from Avro to Hive, since DECIMAL is already a supported data type in 
> Hive (0.11.0).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to