[
https://issues.apache.org/jira/browse/AVRO-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942398#comment-13942398
]
Jarek Jarcec Cecho commented on AVRO-1402:
------------------------------------------
Thank you guys for working on this one! I'm following this JIRA as I would love
to see ability to represent decimals in Avro in a way that would work across
all the processing engines in Hadoop. I've made two small observations that I
would like to share.
1) It seems that the proposal is to serialize the scale with every record
(row). I would like to mention that from database perspective the type of a
column is always with scale and precision. E.g. the type that Hive or other SQL
engines on top Hadoop will store in the file will always be "decimal(5,2)" and
never just a "decimal". Databases do not allows different rows for the same
column to have different scale or precision. Hence encoding the scale with
every record will contain a lot of redundant information for this use case. I
fully understand that Avro being a generic format might want to enable it, but
I wanted to point it out explicitly.
2) I can see that we are serializing only scale and not precision to the disk.
I do understand that from storage perspective, this is fully sufficient,
however I do see a possible problems with follow up processing. I believe that
in order to execute the math in the same way in all the processing engines one
need to know the precision as well. This do not seem to be a problem for
projects that do have separate service keeping the metadata (such as Hive and
it's Hive metastore), but I assume that it can become an issue for projects
that don't have that ability (such as Pig). Hence I'm wondering if it would
make sense to store the precision in the Avro metadata as well?
> Support for DECIMAL primitive type
> ----------------------------------
>
> Key: AVRO-1402
> URL: https://issues.apache.org/jira/browse/AVRO-1402
> Project: Avro
> Issue Type: New Feature
> Affects Versions: 1.7.5
> Reporter: Mariano Dominguez
> Priority: Minor
> Labels: Hive
> Attachments: AVRO-1402.patch
>
>
> Currently, Avro does not seem to support a DECIMAL type or equivalent.
> http://avro.apache.org/docs/1.7.5/spec.html#schema_primitive
> Adding DECIMAL support would be particularly interesting when converting
> types from Avro to Hive, since DECIMAL is already a supported data type in
> Hive (0.11.0).
--
This message was sent by Atlassian JIRA
(v6.2#6252)