Re: [DISCUSS] Support SQL:2011 compliant DECIMAL type evolution with scale changes

rice Zhang Thu, 11 Sep 2025 12:32:16 -0700

Hi Russell,

Thanks for pointing me to Eduard's proposal. I think I found the document
here: 
*https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h
<https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h>*


After reviewing the meeting notes and discussions, it appears this proposal
primarily focuses on restructuring the current column statistics format
(moving from multiple maps to a struct-based structure). However, I
couldn't find any specific discussion about handling decimal type scale
evolution. The proposal does make important improvements to the statistics
structure, but it doesn't seem to solve the fundamental issue we've been
discussing - the need to serialize the scale information alongside the
unscaled value to support safe decimal type evolution. Given this, I think
we need to continue discussing potential solutions for decimal scale
changes. The core problem remains: without serializing the scale, we cannot
correctly interpret historical statistics when the decimal type evolves.

Would love to hear your thoughts on how we should proceed with addressing
this specific issue.

Minglei

rice Zhang <[email protected]> 于2025年9月11日周四 19:47写道：

> I couldn't find it in my search - would appreciate any pointers to the
> proposal or related discussions.
>
> Russell Spitzer <[email protected]> 于2025年9月11日周四 19:32写道：
>
>> This has already been proposed as part of v4, see Edwards column metrics
>> expansion proposal
>>
>> On Thu, Sep 11, 2025 at 4:54 AM rice Zhang <[email protected]> wrote:
>>
>>> Hi, Junwang
>>>
>>> We're discussing the storage of lower and upper bounds for decimal
>>> values in manifest files and their compatibility after type evolution. The
>>> bounds are stored as unscaled values without their original scale, so when
>>> the decimal type changes, we can't correctly interpret these historical
>>> bounds even though we know the current type from metadata.
>>>
>>> Minglei.
>>>
>>> Junwang Zhao <[email protected]> 于2025年9月11日周四 17:46写道：
>>>
>>>> Hi Minglei,
>>>>
>>>> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <[email protected]>
>>>> wrote:
>>>> >
>>>> > Hi Ryan,
>>>> >
>>>> > Thank you for your detailed response. I've discussed this issue
>>>> offline with my team lead, and we've done some deeper investigation into
>>>> the problem. After reviewing the Decimal Type serialization code in
>>>> Iceberg, we confirmed that currently only the unscaled value is serialized
>>>> without storing the scale value. This indeed makes type evolution more
>>>> complex than initially anticipated. Regarding your mention of v4 adopting
>>>> columnar metadata for manifests, while I'm not certain which specific
>>>> format Iceberg will use (perhaps Parquet?), I agree this is a positive
>>>> direction. However, to properly support decimal scale evolution, I believe
>>>> Iceberg would need to fundamentally change how decimal types are
>>>> serialized, regardless of whether using Avro or Parquet. Specifically, we'd
>>>> need to serialize both the unscaled value AND the scale, not just the
>>>> unscaled value.
>>>> >
>>>> > Here's an example: Consider a field initially defined as DECIMAL(5,2)
>>>> with value 123.45 (the serialized unscaled value is 12345). If a user later
>>>> changes the type to DECIMAL(6,3) - which follows SQL:2011 rules since (p-s)
>>>> doesn't decrease - reading the old data with the new type would be
>>>> problematic. Without the original scale being serialized, we can't
>>>> distinguish whether 12345 represents 123.45 (scale=2) or 12.345 (scale=3),
>>>> potentially leading to incorrect data interpretation. By serializing the
>>>> scale alongside the unscaled value, we could correctly read 12345 with
>>>> scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data
>>>> corruption.
>>>>
>>>> The metadata should have the data type, which includes the scale and
>>>> precision, isn't that enough to describe the decimal? Correct me if
>>>> I'm wrong :)
>>>>
>>>> >
>>>> > I'd like to confirm whether this approach of serializing the scale
>>>> value is something you consider viable? Or does the community have other
>>>> better solutions for supporting decimal scale evolution? Also, I'm
>>>> wondering if you've already discussed specific implementation approaches
>>>> for decimal type changes? I'm very interested in understanding how v4 plans
>>>> to address this issue.
>>>> >
>>>> > Minglei
>>>> >
>>>> > Ryan Blue <[email protected]> 于2025年9月11日周四 03:53写道：
>>>> >>
>>>> >> Hi Minglei, thanks for the proposal.
>>>> >>
>>>> >> v3 is now closed, so we can't introduce a breaking change like this
>>>> until v4. We looked into decimal type evolution in v3 and found that due to
>>>> the way that we currently store lower and upper bounds for decimal values,
>>>> we can't safely support this in v3 Iceberg manifests. We will need to wait
>>>> until v4 manifests are introduced with columnar metadata to make this
>>>> change.
>>>> >>
>>>> >> Ryan
>>>> >>
>>>> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <[email protected]>
>>>> wrote:
>>>> >>>
>>>> >>> Hi Iceberg Community,
>>>> >>>
>>>> >>> I'd like to propose extending Iceberg's type promotion rules to
>>>> support DECIMAL type evolution with scale changes, aligning with the
>>>> SQL:2011 standard.
>>>> >>>
>>>> >>> Current Limitation
>>>> >>>   Currently, Iceberg only supports DECIMAL type promotion when:
>>>> >>>   - Scale remains the same
>>>> >>>   - Precision can be increased
>>>> >>>
>>>> >>>   This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not to
>>>> DECIMAL(12,4).
>>>> >>>
>>>> >>> Proposed Change
>>>> >>>   Allow DECIMAL type evolution when:
>>>> >>>   1. Target scale >= source scale
>>>> >>>   2. Target precision >= source precision
>>>> >>>   3. Integer part capacity is preserved: (target_precision -
>>>> target_scale) >= (source_precision - source_scale)
>>>> >>>
>>>> >>> Examples
>>>> >>>   With this change:
>>>> >>>   - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: 2
>>>> → 4)
>>>> >>>   - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale: 2
>>>> → 5)
>>>> >>>   - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would
>>>> lose integer capacity)
>>>> >>>
>>>> >>> Rationale
>>>> >>>   1. SQL:2011 Compliance: This behavior aligns with SQL:2011
>>>> standard expectations
>>>> >>>   2. User Experience: Many users coming from traditional databases
>>>> expect this type evolution to work
>>>> >>>   3. Data Safety: The proposed rules ensure no data loss - existing
>>>> values can always be represented in the new
>>>> >>>   type
>>>> >>>   4. Real-world Use Cases: Common scenarios like adding more
>>>> decimal precision for currency calculations would
>>>> >>>   be supported
>>>> >>>
>>>> >>> Implementation
>>>> >>>   I've created a proof-of-concept implementation:
>>>> https://github.com/apache/iceberg/issues/14037
>>>> >>>
>>>> >>> Questions for Discussion
>>>> >>>   1. Should this be part of the spec v3, or wait for a future
>>>> version?
>>>> >>>   2. Are there any backward compatibility concerns we should
>>>> address?
>>>> >>>
>>>> >>> Looking forward to your feedback and thoughts on this proposal.
>>>> >>>
>>>> >>> Best regards,
>>>> >>> Minglei
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Junwang Zhao
>>>>
>>>

Re: [DISCUSS] Support SQL:2011 compliant DECIMAL type evolution with scale changes

Reply via email to