Re: [DISCUSS] Support SQL:2011 compliant DECIMAL type evolution with scale changes

Micah Kornfield Thu, 11 Sep 2025 21:01:56 -0700

Hi Minglei,
https://docs.google.com/document/d/1uvbrwwAJW2TgsnoaIcwAFpjbhHkBUL5wY_24nKgtt9I/edit?tab=t.0#heading=h.hs6r9d26w1y2
is the original design doc which is probably more useful then the sync
meeting minutes.


but it doesn't seem to solve the fundamental issue we've been discussing -
> the need to serialize the scale information alongside the unscaled value to
> support safe decimal type evolution.


As mentioned above in the thread the reorganization is getting rid of the
Map<column ID, Serialized value> in favor of a shredded version that has a
schema for every min/max bounds.  The example in the design doc shows only
int and string, but for decimal it would have the exact precision and scale
for the min/max bounds making the conversion doable.

Thanks,
Micah

On Thu, Sep 11, 2025 at 5:30 AM rice Zhang <minglei...@gmail.com> wrote:

> Hi Russell,
>
> Thanks for pointing me to Eduard's proposal. I think I found the document
> here: 
> *https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h
> <https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h>*
>
> After reviewing the meeting notes and discussions, it appears this
> proposal primarily focuses on restructuring the current column statistics
> format (moving from multiple maps to a struct-based structure). However, I
> couldn't find any specific discussion about handling decimal type scale
> evolution. The proposal does make important improvements to the statistics
> structure, but it doesn't seem to solve the fundamental issue we've been
> discussing - the need to serialize the scale information alongside the
> unscaled value to support safe decimal type evolution. Given this, I think
> we need to continue discussing potential solutions for decimal scale
> changes. The core problem remains: without serializing the scale, we cannot
> correctly interpret historical statistics when the decimal type evolves.
>
> Would love to hear your thoughts on how we should proceed with addressing
> this specific issue.
>
> Minglei
>
> rice Zhang <minglei...@gmail.com> 于2025年9月11日周四 19:47写道：
>
>> I couldn't find it in my search - would appreciate any pointers to the
>> proposal or related discussions.
>>
>> Russell Spitzer <russell.spit...@gmail.com> 于2025年9月11日周四 19:32写道：
>>
>>> This has already been proposed as part of v4, see Edwards column metrics
>>> expansion proposal
>>>
>>> On Thu, Sep 11, 2025 at 4:54 AM rice Zhang <minglei...@gmail.com> wrote:
>>>
>>>> Hi, Junwang
>>>>
>>>> We're discussing the storage of lower and upper bounds for decimal
>>>> values in manifest files and their compatibility after type evolution. The
>>>> bounds are stored as unscaled values without their original scale, so when
>>>> the decimal type changes, we can't correctly interpret these historical
>>>> bounds even though we know the current type from metadata.
>>>>
>>>> Minglei.
>>>>
>>>> Junwang Zhao <zhjw...@gmail.com> 于2025年9月11日周四 17:46写道：
>>>>
>>>>> Hi Minglei,
>>>>>
>>>>> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <minglei...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi Ryan,
>>>>> >
>>>>> > Thank you for your detailed response. I've discussed this issue
>>>>> offline with my team lead, and we've done some deeper investigation into
>>>>> the problem. After reviewing the Decimal Type serialization code in
>>>>> Iceberg, we confirmed that currently only the unscaled value is serialized
>>>>> without storing the scale value. This indeed makes type evolution more
>>>>> complex than initially anticipated. Regarding your mention of v4 adopting
>>>>> columnar metadata for manifests, while I'm not certain which specific
>>>>> format Iceberg will use (perhaps Parquet?), I agree this is a positive
>>>>> direction. However, to properly support decimal scale evolution, I believe
>>>>> Iceberg would need to fundamentally change how decimal types are
>>>>> serialized, regardless of whether using Avro or Parquet. Specifically, 
>>>>> we'd
>>>>> need to serialize both the unscaled value AND the scale, not just the
>>>>> unscaled value.
>>>>> >
>>>>> > Here's an example: Consider a field initially defined as
>>>>> DECIMAL(5,2) with value 123.45 (the serialized unscaled value is 12345). 
>>>>> If
>>>>> a user later changes the type to DECIMAL(6,3) - which follows SQL:2011
>>>>> rules since (p-s) doesn't decrease - reading the old data with the new 
>>>>> type
>>>>> would be problematic. Without the original scale being serialized, we 
>>>>> can't
>>>>> distinguish whether 12345 represents 123.45 (scale=2) or 12.345 (scale=3),
>>>>> potentially leading to incorrect data interpretation. By serializing the
>>>>> scale alongside the unscaled value, we could correctly read 12345 with
>>>>> scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data
>>>>> corruption.
>>>>>
>>>>> The metadata should have the data type, which includes the scale and
>>>>> precision, isn't that enough to describe the decimal? Correct me if
>>>>> I'm wrong :)
>>>>>
>>>>> >
>>>>> > I'd like to confirm whether this approach of serializing the scale
>>>>> value is something you consider viable? Or does the community have other
>>>>> better solutions for supporting decimal scale evolution? Also, I'm
>>>>> wondering if you've already discussed specific implementation approaches
>>>>> for decimal type changes? I'm very interested in understanding how v4 
>>>>> plans
>>>>> to address this issue.
>>>>> >
>>>>> > Minglei
>>>>> >
>>>>> > Ryan Blue <rdb...@gmail.com> 于2025年9月11日周四 03:53写道：
>>>>> >>
>>>>> >> Hi Minglei, thanks for the proposal.
>>>>> >>
>>>>> >> v3 is now closed, so we can't introduce a breaking change like this
>>>>> until v4. We looked into decimal type evolution in v3 and found that due 
>>>>> to
>>>>> the way that we currently store lower and upper bounds for decimal values,
>>>>> we can't safely support this in v3 Iceberg manifests. We will need to wait
>>>>> until v4 manifests are introduced with columnar metadata to make this
>>>>> change.
>>>>> >>
>>>>> >> Ryan
>>>>> >>
>>>>> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <minglei...@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Hi Iceberg Community,
>>>>> >>>
>>>>> >>> I'd like to propose extending Iceberg's type promotion rules to
>>>>> support DECIMAL type evolution with scale changes, aligning with the
>>>>> SQL:2011 standard.
>>>>> >>>
>>>>> >>> Current Limitation
>>>>> >>>   Currently, Iceberg only supports DECIMAL type promotion when:
>>>>> >>>   - Scale remains the same
>>>>> >>>   - Precision can be increased
>>>>> >>>
>>>>> >>>   This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not to
>>>>> DECIMAL(12,4).
>>>>> >>>
>>>>> >>> Proposed Change
>>>>> >>>   Allow DECIMAL type evolution when:
>>>>> >>>   1. Target scale >= source scale
>>>>> >>>   2. Target precision >= source precision
>>>>> >>>   3. Integer part capacity is preserved: (target_precision -
>>>>> target_scale) >= (source_precision - source_scale)
>>>>> >>>
>>>>> >>> Examples
>>>>> >>>   With this change:
>>>>> >>>   - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: 2
>>>>> → 4)
>>>>> >>>   - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale:
>>>>> 2 → 5)
>>>>> >>>   - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would
>>>>> lose integer capacity)
>>>>> >>>
>>>>> >>> Rationale
>>>>> >>>   1. SQL:2011 Compliance: This behavior aligns with SQL:2011
>>>>> standard expectations
>>>>> >>>   2. User Experience: Many users coming from traditional databases
>>>>> expect this type evolution to work
>>>>> >>>   3. Data Safety: The proposed rules ensure no data loss -
>>>>> existing values can always be represented in the new
>>>>> >>>   type
>>>>> >>>   4. Real-world Use Cases: Common scenarios like adding more
>>>>> decimal precision for currency calculations would
>>>>> >>>   be supported
>>>>> >>>
>>>>> >>> Implementation
>>>>> >>>   I've created a proof-of-concept implementation:
>>>>> https://github.com/apache/iceberg/issues/14037
>>>>> >>>
>>>>> >>> Questions for Discussion
>>>>> >>>   1. Should this be part of the spec v3, or wait for a future
>>>>> version?
>>>>> >>>   2. Are there any backward compatibility concerns we should
>>>>> address?
>>>>> >>>
>>>>> >>> Looking forward to your feedback and thoughts on this proposal.
>>>>> >>>
>>>>> >>> Best regards,
>>>>> >>> Minglei
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Junwang Zhao
>>>>>
>>>>

Re: [DISCUSS] Support SQL:2011 compliant DECIMAL type evolution with scale changes

Reply via email to