On Thu, Sep 11, 2025 at 5:54 PM rice Zhang <minglei...@gmail.com> wrote:
>
> Hi, Junwang
>
> We're discussing the storage of lower and upper bounds for decimal values in 
> manifest files and their compatibility after type evolution. The bounds are 
> stored as unscaled values without their original scale, so when the decimal 
> type changes, we can't correctly interpret these historical bounds even 
> though we know the current type from metadata.

Ok, that explains, thanks.

>
> Minglei.
>
> Junwang Zhao <zhjw...@gmail.com> 于2025年9月11日周四 17:46写道:
>>
>> Hi Minglei,
>>
>> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <minglei...@gmail.com> wrote:
>> >
>> > Hi Ryan,
>> >
>> > Thank you for your detailed response. I've discussed this issue offline 
>> > with my team lead, and we've done some deeper investigation into the 
>> > problem. After reviewing the Decimal Type serialization code in Iceberg, 
>> > we confirmed that currently only the unscaled value is serialized without 
>> > storing the scale value. This indeed makes type evolution more complex 
>> > than initially anticipated. Regarding your mention of v4 adopting columnar 
>> > metadata for manifests, while I'm not certain which specific format 
>> > Iceberg will use (perhaps Parquet?), I agree this is a positive direction. 
>> > However, to properly support decimal scale evolution, I believe Iceberg 
>> > would need to fundamentally change how decimal types are serialized, 
>> > regardless of whether using Avro or Parquet. Specifically, we'd need to 
>> > serialize both the unscaled value AND the scale, not just the unscaled 
>> > value.
>> >
>> > Here's an example: Consider a field initially defined as DECIMAL(5,2) with 
>> > value 123.45 (the serialized unscaled value is 12345). If a user later 
>> > changes the type to DECIMAL(6,3) - which follows SQL:2011 rules since 
>> > (p-s) doesn't decrease - reading the old data with the new type would be 
>> > problematic. Without the original scale being serialized, we can't 
>> > distinguish whether 12345 represents 123.45 (scale=2) or 12.345 (scale=3), 
>> > potentially leading to incorrect data interpretation. By serializing the 
>> > scale alongside the unscaled value, we could correctly read 12345 with 
>> > scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data 
>> > corruption.
>>
>> The metadata should have the data type, which includes the scale and
>> precision, isn't that enough to describe the decimal? Correct me if
>> I'm wrong :)
>>
>> >
>> > I'd like to confirm whether this approach of serializing the scale value 
>> > is something you consider viable? Or does the community have other better 
>> > solutions for supporting decimal scale evolution? Also, I'm wondering if 
>> > you've already discussed specific implementation approaches for decimal 
>> > type changes? I'm very interested in understanding how v4 plans to address 
>> > this issue.
>> >
>> > Minglei
>> >
>> > Ryan Blue <rdb...@gmail.com> 于2025年9月11日周四 03:53写道:
>> >>
>> >> Hi Minglei, thanks for the proposal.
>> >>
>> >> v3 is now closed, so we can't introduce a breaking change like this until 
>> >> v4. We looked into decimal type evolution in v3 and found that due to the 
>> >> way that we currently store lower and upper bounds for decimal values, we 
>> >> can't safely support this in v3 Iceberg manifests. We will need to wait 
>> >> until v4 manifests are introduced with columnar metadata to make this 
>> >> change.
>> >>
>> >> Ryan
>> >>
>> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <minglei...@gmail.com> wrote:
>> >>>
>> >>> Hi Iceberg Community,
>> >>>
>> >>> I'd like to propose extending Iceberg's type promotion rules to support 
>> >>> DECIMAL type evolution with scale changes, aligning with the SQL:2011 
>> >>> standard.
>> >>>
>> >>> Current Limitation
>> >>>   Currently, Iceberg only supports DECIMAL type promotion when:
>> >>>   - Scale remains the same
>> >>>   - Precision can be increased
>> >>>
>> >>>   This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not to 
>> >>> DECIMAL(12,4).
>> >>>
>> >>> Proposed Change
>> >>>   Allow DECIMAL type evolution when:
>> >>>   1. Target scale >= source scale
>> >>>   2. Target precision >= source precision
>> >>>   3. Integer part capacity is preserved: (target_precision - 
>> >>> target_scale) >= (source_precision - source_scale)
>> >>>
>> >>> Examples
>> >>>   With this change:
>> >>>   - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: 2 → 4)
>> >>>   - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale: 2 → 5)
>> >>>   - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would lose 
>> >>> integer capacity)
>> >>>
>> >>> Rationale
>> >>>   1. SQL:2011 Compliance: This behavior aligns with SQL:2011 standard 
>> >>> expectations
>> >>>   2. User Experience: Many users coming from traditional databases 
>> >>> expect this type evolution to work
>> >>>   3. Data Safety: The proposed rules ensure no data loss - existing 
>> >>> values can always be represented in the new
>> >>>   type
>> >>>   4. Real-world Use Cases: Common scenarios like adding more decimal 
>> >>> precision for currency calculations would
>> >>>   be supported
>> >>>
>> >>> Implementation
>> >>>   I've created a proof-of-concept implementation: 
>> >>> https://github.com/apache/iceberg/issues/14037
>> >>>
>> >>> Questions for Discussion
>> >>>   1. Should this be part of the spec v3, or wait for a future version?
>> >>>   2. Are there any backward compatibility concerns we should address?
>> >>>
>> >>> Looking forward to your feedback and thoughts on this proposal.
>> >>>
>> >>> Best regards,
>> >>> Minglei
>>
>>
>>
>> --
>> Regards
>> Junwang Zhao



-- 
Regards
Junwang Zhao

Reply via email to