On Mon, Jun 10, 2019 at 4:18 PM Wes McKinney <wesmck...@gmail.com> wrote: > > On the 1.0.0 protocol discussion, one item that we've skirted for some > time is other decimal sizes: > > https://issues.apache.org/jira/browse/ARROW-2009 > > I understand this is a loaded subject since a deliberate decision was > made to remove types from the initial Java implementation of Arrow > that was forked from Apache Drill. However, it's a friction point that > has come up in a number of scenarios as many database and storage > systems have 32- and 64-bit variants for low precision decimal data. > As an example Apache Kudu [1] has all three types, and the Parquet > columnar format allows not only 32/64 bit storage but fixed size > binary (size a function of precision) and variable-length binary > encoding [2]. > > One of the arguments against using these types in a computational > setting is that many mathematical operations will necessarily trigger > an up-promotion to a larger type. It's hard for us to predict how > people will use the Arrow format, though, and the current situation is > forcing an up-promotion regardless of how the format is being used, > even for simple data transport > > In anticipation of long-term needs, I would suggest a possible solution of: > > * Adding bitWidth field to Decimal table in Schema.fbs [3] with > default value of 128 > * Constraining bit widths to 32, 64, and 128 bits for the time being > * Permit storage of smaller precision decimals in larger storage like > we have now
BTW, even if we do not allow 32/64 bit decimals in the format, we should consider adding a bitWidth field with static value 128 as a matter of future-proofing the metadata. This change would make it so that old readers are unable to see the bitWidth field, so the addition would not be possible without bumping the protocol version. > > If this isn't deemed desirable by the community, decimal extension > types could be employed for serialization-free transport for smaller > decimals, but I view this as suboptimal. > > Interested in the thoughts of others. > > thanks > Wes > > [1]: > https://github.com/apache/kudu/blob/master/src/kudu/common/common.proto#L55 > [2]: > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal > [3]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L121