Micah has it correct, since each metric is no longer stored serialized binary, Instead each metric will be strongly typed.
On Thu, Sep 11, 2025 at 10:55 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Minglei, > > https://docs.google.com/document/d/1uvbrwwAJW2TgsnoaIcwAFpjbhHkBUL5wY_24nKgtt9I/edit?tab=t.0#heading=h.hs6r9d26w1y2 > is the original design doc which is probably more useful then the sync > meeting minutes. > > but it doesn't seem to solve the fundamental issue we've been discussing - >> the need to serialize the scale information alongside the unscaled value to >> support safe decimal type evolution. > > > As mentioned above in the thread the reorganization is getting rid of the > Map<column ID, Serialized value> in favor of a shredded version that has a > schema for every min/max bounds. The example in the design doc shows only > int and string, but for decimal it would have the exact precision and scale > for the min/max bounds making the conversion doable. > > Thanks, > Micah > > On Thu, Sep 11, 2025 at 5:30 AM rice Zhang <minglei...@gmail.com> wrote: > >> Hi Russell, >> >> Thanks for pointing me to Eduard's proposal. I think I found the document >> here: >> *https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h >> <https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h>* >> >> After reviewing the meeting notes and discussions, it appears this >> proposal primarily focuses on restructuring the current column statistics >> format (moving from multiple maps to a struct-based structure). However, I >> couldn't find any specific discussion about handling decimal type scale >> evolution. The proposal does make important improvements to the statistics >> structure, but it doesn't seem to solve the fundamental issue we've been >> discussing - the need to serialize the scale information alongside the >> unscaled value to support safe decimal type evolution. Given this, I think >> we need to continue discussing potential solutions for decimal scale >> changes. The core problem remains: without serializing the scale, we cannot >> correctly interpret historical statistics when the decimal type evolves. >> >> Would love to hear your thoughts on how we should proceed with addressing >> this specific issue. >> >> Minglei >> >> rice Zhang <minglei...@gmail.com> 于2025年9月11日周四 19:47写道: >> >>> I couldn't find it in my search - would appreciate any pointers to the >>> proposal or related discussions. >>> >>> Russell Spitzer <russell.spit...@gmail.com> 于2025年9月11日周四 19:32写道: >>> >>>> This has already been proposed as part of v4, see Edwards column >>>> metrics expansion proposal >>>> >>>> On Thu, Sep 11, 2025 at 4:54 AM rice Zhang <minglei...@gmail.com> >>>> wrote: >>>> >>>>> Hi, Junwang >>>>> >>>>> We're discussing the storage of lower and upper bounds for decimal >>>>> values in manifest files and their compatibility after type evolution. The >>>>> bounds are stored as unscaled values without their original scale, so when >>>>> the decimal type changes, we can't correctly interpret these historical >>>>> bounds even though we know the current type from metadata. >>>>> >>>>> Minglei. >>>>> >>>>> Junwang Zhao <zhjw...@gmail.com> 于2025年9月11日周四 17:46写道: >>>>> >>>>>> Hi Minglei, >>>>>> >>>>>> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <minglei...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > Hi Ryan, >>>>>> > >>>>>> > Thank you for your detailed response. I've discussed this issue >>>>>> offline with my team lead, and we've done some deeper investigation into >>>>>> the problem. After reviewing the Decimal Type serialization code in >>>>>> Iceberg, we confirmed that currently only the unscaled value is >>>>>> serialized >>>>>> without storing the scale value. This indeed makes type evolution more >>>>>> complex than initially anticipated. Regarding your mention of v4 adopting >>>>>> columnar metadata for manifests, while I'm not certain which specific >>>>>> format Iceberg will use (perhaps Parquet?), I agree this is a positive >>>>>> direction. However, to properly support decimal scale evolution, I >>>>>> believe >>>>>> Iceberg would need to fundamentally change how decimal types are >>>>>> serialized, regardless of whether using Avro or Parquet. Specifically, >>>>>> we'd >>>>>> need to serialize both the unscaled value AND the scale, not just the >>>>>> unscaled value. >>>>>> > >>>>>> > Here's an example: Consider a field initially defined as >>>>>> DECIMAL(5,2) with value 123.45 (the serialized unscaled value is 12345). >>>>>> If >>>>>> a user later changes the type to DECIMAL(6,3) - which follows SQL:2011 >>>>>> rules since (p-s) doesn't decrease - reading the old data with the new >>>>>> type >>>>>> would be problematic. Without the original scale being serialized, we >>>>>> can't >>>>>> distinguish whether 12345 represents 123.45 (scale=2) or 12.345 >>>>>> (scale=3), >>>>>> potentially leading to incorrect data interpretation. By serializing the >>>>>> scale alongside the unscaled value, we could correctly read 12345 with >>>>>> scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data >>>>>> corruption. >>>>>> >>>>>> The metadata should have the data type, which includes the scale and >>>>>> precision, isn't that enough to describe the decimal? Correct me if >>>>>> I'm wrong :) >>>>>> >>>>>> > >>>>>> > I'd like to confirm whether this approach of serializing the scale >>>>>> value is something you consider viable? Or does the community have other >>>>>> better solutions for supporting decimal scale evolution? Also, I'm >>>>>> wondering if you've already discussed specific implementation approaches >>>>>> for decimal type changes? I'm very interested in understanding how v4 >>>>>> plans >>>>>> to address this issue. >>>>>> > >>>>>> > Minglei >>>>>> > >>>>>> > Ryan Blue <rdb...@gmail.com> 于2025年9月11日周四 03:53写道: >>>>>> >> >>>>>> >> Hi Minglei, thanks for the proposal. >>>>>> >> >>>>>> >> v3 is now closed, so we can't introduce a breaking change like >>>>>> this until v4. We looked into decimal type evolution in v3 and found that >>>>>> due to the way that we currently store lower and upper bounds for decimal >>>>>> values, we can't safely support this in v3 Iceberg manifests. We will >>>>>> need >>>>>> to wait until v4 manifests are introduced with columnar metadata to make >>>>>> this change. >>>>>> >> >>>>>> >> Ryan >>>>>> >> >>>>>> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <minglei...@gmail.com> >>>>>> wrote: >>>>>> >>> >>>>>> >>> Hi Iceberg Community, >>>>>> >>> >>>>>> >>> I'd like to propose extending Iceberg's type promotion rules to >>>>>> support DECIMAL type evolution with scale changes, aligning with the >>>>>> SQL:2011 standard. >>>>>> >>> >>>>>> >>> Current Limitation >>>>>> >>> Currently, Iceberg only supports DECIMAL type promotion when: >>>>>> >>> - Scale remains the same >>>>>> >>> - Precision can be increased >>>>>> >>> >>>>>> >>> This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not >>>>>> to DECIMAL(12,4). >>>>>> >>> >>>>>> >>> Proposed Change >>>>>> >>> Allow DECIMAL type evolution when: >>>>>> >>> 1. Target scale >= source scale >>>>>> >>> 2. Target precision >= source precision >>>>>> >>> 3. Integer part capacity is preserved: (target_precision - >>>>>> target_scale) >= (source_precision - source_scale) >>>>>> >>> >>>>>> >>> Examples >>>>>> >>> With this change: >>>>>> >>> - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: >>>>>> 2 → 4) >>>>>> >>> - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale: >>>>>> 2 → 5) >>>>>> >>> - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would >>>>>> lose integer capacity) >>>>>> >>> >>>>>> >>> Rationale >>>>>> >>> 1. SQL:2011 Compliance: This behavior aligns with SQL:2011 >>>>>> standard expectations >>>>>> >>> 2. User Experience: Many users coming from traditional >>>>>> databases expect this type evolution to work >>>>>> >>> 3. Data Safety: The proposed rules ensure no data loss - >>>>>> existing values can always be represented in the new >>>>>> >>> type >>>>>> >>> 4. Real-world Use Cases: Common scenarios like adding more >>>>>> decimal precision for currency calculations would >>>>>> >>> be supported >>>>>> >>> >>>>>> >>> Implementation >>>>>> >>> I've created a proof-of-concept implementation: >>>>>> https://github.com/apache/iceberg/issues/14037 >>>>>> >>> >>>>>> >>> Questions for Discussion >>>>>> >>> 1. Should this be part of the spec v3, or wait for a future >>>>>> version? >>>>>> >>> 2. Are there any backward compatibility concerns we should >>>>>> address? >>>>>> >>> >>>>>> >>> Looking forward to your feedback and thoughts on this proposal. >>>>>> >>> >>>>>> >>> Best regards, >>>>>> >>> Minglei >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards >>>>>> Junwang Zhao >>>>>> >>>>>