[
https://issues.apache.org/jira/browse/AVRO-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963738#comment-13963738
]
Skye Wanderman-Milne commented on AVRO-1402:
--------------------------------------------
I strongly prefer the logical type approach, rather than a record mapping, due
to the problem with multiple record mapping types in the same schema. Even if
we stick with storing the scale in the data and not the schema, I don't think
we should go down the record mapping path at all since it precludes any
parameterized types, i.e. types that include schema metadata. We may want to
introduce other types in the future that store extra metadata (e.g. a time with
a timezone), and it seems unacceptable to not be able to include multiple
instances of the same type with different metadata in the schema.
I've written a micro-benchmark to get an idea of the performance of storing
scale in the data vs. only in the schema. I see pretty big differences in
performance depending on whether I compile with gcc or clang, so I don't want
to give definitive numbers yet, but I'm seeing perf hits of between 5-30%
depending on how many different scales are stored in the data (compared to
storing a single scale in the schema). If we feel nailing down this performance
difference is important, I can dig deeper and try to determine what the most
"representative" case is.
Talking offline with Ryan and others, we concluded that while it's possible to
imagine scenarios where per-value scales are useful, it's not likely to be a
concern in the large majority of cases (Ryan, please let me know if I'm
mischaracterizing what you said). Given that we can't think of a very
compelling use case for storing scales in the data, I think we should store the
scales in the schema, using the logical type schema Tom suggested above.
There's a definite performance impact, and keeping the current implementation
adds complexity for applications that do not take advantage of the flexibility.
I also think there's the possibility of confusing users who may decide to write
multiple-scale values to the same file, only to learn later they can't easily
be accessed as such.
tl;dr: I like this one:
{noformat}
{"type":"bytes", "logicalType":"decimal", "scale":"2”}
{noformat}
> Support for DECIMAL type
> ------------------------
>
> Key: AVRO-1402
> URL: https://issues.apache.org/jira/browse/AVRO-1402
> Project: Avro
> Issue Type: New Feature
> Affects Versions: 1.7.5
> Reporter: Mariano Dominguez
> Assignee: Tom White
> Priority: Minor
> Labels: Hive
> Fix For: 1.7.7
>
> Attachments: AVRO-1402.patch, AVRO-1402.patch, AVRO-1402.patch,
> AVRO-1402.patch, UnixEpochRecordMapping.patch
>
>
> Currently, Avro does not seem to support a DECIMAL type or equivalent.
> http://avro.apache.org/docs/1.7.5/spec.html#schema_primitive
> Adding DECIMAL support would be particularly interesting when converting
> types from Avro to Hive, since DECIMAL is already a supported data type in
> Hive (0.11.0).
--
This message was sent by Atlassian JIRA
(v6.2#6252)