[ 
https://issues.apache.org/jira/browse/AVRO-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963738#comment-13963738
 ] 

Skye Wanderman-Milne commented on AVRO-1402:
--------------------------------------------

I strongly prefer the logical type approach, rather than a record mapping, due 
to the problem with multiple record mapping types in the same schema. Even if 
we stick with storing the scale in the data and not the schema, I don't think 
we should go down the record mapping path at all since it precludes any 
parameterized types, i.e. types that include schema metadata. We may want to 
introduce other types in the future that store extra metadata (e.g. a time with 
a timezone), and it seems unacceptable to not be able to include multiple 
instances of the same type with different metadata in the schema.

I've written a micro-benchmark to get an idea of the performance of storing 
scale in the data vs. only in the schema. I see pretty big differences in 
performance depending on whether I compile with gcc or clang, so I don't want 
to give definitive numbers yet, but I'm seeing perf hits of between 5-30% 
depending on how many different scales are stored in the data (compared to 
storing a single scale in the schema). If we feel nailing down this performance 
difference is important, I can dig deeper and try to determine what the most 
"representative" case is.

Talking offline with Ryan and others, we concluded that while it's possible to 
imagine scenarios where per-value scales are useful, it's not likely to be a 
concern in the large majority of cases (Ryan, please let me know if I'm 
mischaracterizing what you said). Given that we can't think of a very 
compelling use case for storing scales in the data, I think we should store the 
scales in the schema, using the logical type schema Tom suggested above. 
There's a definite performance impact, and keeping the current implementation 
adds complexity for applications that do not take advantage of the flexibility. 
I also think there's the possibility of confusing users who may decide to write 
multiple-scale values to the same file, only to learn later they can't easily 
be accessed as such.

tl;dr: I like this one:
{noformat}
{"type":"bytes", "logicalType":"decimal", "scale":"2”}
{noformat}

> Support for DECIMAL type
> ------------------------
>
>                 Key: AVRO-1402
>                 URL: https://issues.apache.org/jira/browse/AVRO-1402
>             Project: Avro
>          Issue Type: New Feature
>    Affects Versions: 1.7.5
>            Reporter: Mariano Dominguez
>            Assignee: Tom White
>            Priority: Minor
>              Labels: Hive
>             Fix For: 1.7.7
>
>         Attachments: AVRO-1402.patch, AVRO-1402.patch, AVRO-1402.patch, 
> AVRO-1402.patch, UnixEpochRecordMapping.patch
>
>
> Currently, Avro does not seem to support a DECIMAL type or equivalent.
> http://avro.apache.org/docs/1.7.5/spec.html#schema_primitive
> Adding DECIMAL support would be particularly interesting when converting 
> types from Avro to Hive, since DECIMAL is already a supported data type in 
> Hive (0.11.0).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to