Micah has it correct, since each metric is no longer stored serialized
binary, Instead each metric will be strongly typed.

On Thu, Sep 11, 2025 at 10:55 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Minglei,
>
> https://docs.google.com/document/d/1uvbrwwAJW2TgsnoaIcwAFpjbhHkBUL5wY_24nKgtt9I/edit?tab=t.0#heading=h.hs6r9d26w1y2
> is the original design doc which is probably more useful then the sync
> meeting minutes.
>
> but it doesn't seem to solve the fundamental issue we've been discussing -
>> the need to serialize the scale information alongside the unscaled value to
>> support safe decimal type evolution.
>
>
> As mentioned above in the thread the reorganization is getting rid of the
> Map<column ID, Serialized value> in favor of a shredded version that has a
> schema for every min/max bounds.  The example in the design doc shows only
> int and string, but for decimal it would have the exact precision and scale
> for the min/max bounds making the conversion doable.
>
> Thanks,
> Micah
>
> On Thu, Sep 11, 2025 at 5:30 AM rice Zhang <minglei...@gmail.com> wrote:
>
>> Hi Russell,
>>
>> Thanks for pointing me to Eduard's proposal. I think I found the document
>> here: 
>> *https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h
>> <https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h>*
>>
>> After reviewing the meeting notes and discussions, it appears this
>> proposal primarily focuses on restructuring the current column statistics
>> format (moving from multiple maps to a struct-based structure). However, I
>> couldn't find any specific discussion about handling decimal type scale
>> evolution. The proposal does make important improvements to the statistics
>> structure, but it doesn't seem to solve the fundamental issue we've been
>> discussing - the need to serialize the scale information alongside the
>> unscaled value to support safe decimal type evolution. Given this, I think
>> we need to continue discussing potential solutions for decimal scale
>> changes. The core problem remains: without serializing the scale, we cannot
>> correctly interpret historical statistics when the decimal type evolves.
>>
>> Would love to hear your thoughts on how we should proceed with addressing
>> this specific issue.
>>
>> Minglei
>>
>> rice Zhang <minglei...@gmail.com> 于2025年9月11日周四 19:47写道:
>>
>>> I couldn't find it in my search - would appreciate any pointers to the
>>> proposal or related discussions.
>>>
>>> Russell Spitzer <russell.spit...@gmail.com> 于2025年9月11日周四 19:32写道:
>>>
>>>> This has already been proposed as part of v4, see Edwards column
>>>> metrics expansion proposal
>>>>
>>>> On Thu, Sep 11, 2025 at 4:54 AM rice Zhang <minglei...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi, Junwang
>>>>>
>>>>> We're discussing the storage of lower and upper bounds for decimal
>>>>> values in manifest files and their compatibility after type evolution. The
>>>>> bounds are stored as unscaled values without their original scale, so when
>>>>> the decimal type changes, we can't correctly interpret these historical
>>>>> bounds even though we know the current type from metadata.
>>>>>
>>>>> Minglei.
>>>>>
>>>>> Junwang Zhao <zhjw...@gmail.com> 于2025年9月11日周四 17:46写道:
>>>>>
>>>>>> Hi Minglei,
>>>>>>
>>>>>> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <minglei...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hi Ryan,
>>>>>> >
>>>>>> > Thank you for your detailed response. I've discussed this issue
>>>>>> offline with my team lead, and we've done some deeper investigation into
>>>>>> the problem. After reviewing the Decimal Type serialization code in
>>>>>> Iceberg, we confirmed that currently only the unscaled value is 
>>>>>> serialized
>>>>>> without storing the scale value. This indeed makes type evolution more
>>>>>> complex than initially anticipated. Regarding your mention of v4 adopting
>>>>>> columnar metadata for manifests, while I'm not certain which specific
>>>>>> format Iceberg will use (perhaps Parquet?), I agree this is a positive
>>>>>> direction. However, to properly support decimal scale evolution, I 
>>>>>> believe
>>>>>> Iceberg would need to fundamentally change how decimal types are
>>>>>> serialized, regardless of whether using Avro or Parquet. Specifically, 
>>>>>> we'd
>>>>>> need to serialize both the unscaled value AND the scale, not just the
>>>>>> unscaled value.
>>>>>> >
>>>>>> > Here's an example: Consider a field initially defined as
>>>>>> DECIMAL(5,2) with value 123.45 (the serialized unscaled value is 12345). 
>>>>>> If
>>>>>> a user later changes the type to DECIMAL(6,3) - which follows SQL:2011
>>>>>> rules since (p-s) doesn't decrease - reading the old data with the new 
>>>>>> type
>>>>>> would be problematic. Without the original scale being serialized, we 
>>>>>> can't
>>>>>> distinguish whether 12345 represents 123.45 (scale=2) or 12.345 
>>>>>> (scale=3),
>>>>>> potentially leading to incorrect data interpretation. By serializing the
>>>>>> scale alongside the unscaled value, we could correctly read 12345 with
>>>>>> scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data
>>>>>> corruption.
>>>>>>
>>>>>> The metadata should have the data type, which includes the scale and
>>>>>> precision, isn't that enough to describe the decimal? Correct me if
>>>>>> I'm wrong :)
>>>>>>
>>>>>> >
>>>>>> > I'd like to confirm whether this approach of serializing the scale
>>>>>> value is something you consider viable? Or does the community have other
>>>>>> better solutions for supporting decimal scale evolution? Also, I'm
>>>>>> wondering if you've already discussed specific implementation approaches
>>>>>> for decimal type changes? I'm very interested in understanding how v4 
>>>>>> plans
>>>>>> to address this issue.
>>>>>> >
>>>>>> > Minglei
>>>>>> >
>>>>>> > Ryan Blue <rdb...@gmail.com> 于2025年9月11日周四 03:53写道:
>>>>>> >>
>>>>>> >> Hi Minglei, thanks for the proposal.
>>>>>> >>
>>>>>> >> v3 is now closed, so we can't introduce a breaking change like
>>>>>> this until v4. We looked into decimal type evolution in v3 and found that
>>>>>> due to the way that we currently store lower and upper bounds for decimal
>>>>>> values, we can't safely support this in v3 Iceberg manifests. We will 
>>>>>> need
>>>>>> to wait until v4 manifests are introduced with columnar metadata to make
>>>>>> this change.
>>>>>> >>
>>>>>> >> Ryan
>>>>>> >>
>>>>>> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <minglei...@gmail.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> Hi Iceberg Community,
>>>>>> >>>
>>>>>> >>> I'd like to propose extending Iceberg's type promotion rules to
>>>>>> support DECIMAL type evolution with scale changes, aligning with the
>>>>>> SQL:2011 standard.
>>>>>> >>>
>>>>>> >>> Current Limitation
>>>>>> >>>   Currently, Iceberg only supports DECIMAL type promotion when:
>>>>>> >>>   - Scale remains the same
>>>>>> >>>   - Precision can be increased
>>>>>> >>>
>>>>>> >>>   This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not
>>>>>> to DECIMAL(12,4).
>>>>>> >>>
>>>>>> >>> Proposed Change
>>>>>> >>>   Allow DECIMAL type evolution when:
>>>>>> >>>   1. Target scale >= source scale
>>>>>> >>>   2. Target precision >= source precision
>>>>>> >>>   3. Integer part capacity is preserved: (target_precision -
>>>>>> target_scale) >= (source_precision - source_scale)
>>>>>> >>>
>>>>>> >>> Examples
>>>>>> >>>   With this change:
>>>>>> >>>   - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale:
>>>>>> 2 → 4)
>>>>>> >>>   - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale:
>>>>>> 2 → 5)
>>>>>> >>>   - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would
>>>>>> lose integer capacity)
>>>>>> >>>
>>>>>> >>> Rationale
>>>>>> >>>   1. SQL:2011 Compliance: This behavior aligns with SQL:2011
>>>>>> standard expectations
>>>>>> >>>   2. User Experience: Many users coming from traditional
>>>>>> databases expect this type evolution to work
>>>>>> >>>   3. Data Safety: The proposed rules ensure no data loss -
>>>>>> existing values can always be represented in the new
>>>>>> >>>   type
>>>>>> >>>   4. Real-world Use Cases: Common scenarios like adding more
>>>>>> decimal precision for currency calculations would
>>>>>> >>>   be supported
>>>>>> >>>
>>>>>> >>> Implementation
>>>>>> >>>   I've created a proof-of-concept implementation:
>>>>>> https://github.com/apache/iceberg/issues/14037
>>>>>> >>>
>>>>>> >>> Questions for Discussion
>>>>>> >>>   1. Should this be part of the spec v3, or wait for a future
>>>>>> version?
>>>>>> >>>   2. Are there any backward compatibility concerns we should
>>>>>> address?
>>>>>> >>>
>>>>>> >>> Looking forward to your feedback and thoughts on this proposal.
>>>>>> >>>
>>>>>> >>> Best regards,
>>>>>> >>> Minglei
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Junwang Zhao
>>>>>>
>>>>>

Reply via email to