Hi Minglei, https://docs.google.com/document/d/1uvbrwwAJW2TgsnoaIcwAFpjbhHkBUL5wY_24nKgtt9I/edit?tab=t.0#heading=h.hs6r9d26w1y2 is the original design doc which is probably more useful then the sync meeting minutes.
but it doesn't seem to solve the fundamental issue we've been discussing - > the need to serialize the scale information alongside the unscaled value to > support safe decimal type evolution. As mentioned above in the thread the reorganization is getting rid of the Map<column ID, Serialized value> in favor of a shredded version that has a schema for every min/max bounds. The example in the design doc shows only int and string, but for decimal it would have the exact precision and scale for the min/max bounds making the conversion doable. Thanks, Micah On Thu, Sep 11, 2025 at 5:30 AM rice Zhang <minglei...@gmail.com> wrote: > Hi Russell, > > Thanks for pointing me to Eduard's proposal. I think I found the document > here: > *https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h > <https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?pli=1&tab=t.v6wlpv1dix8h>* > > After reviewing the meeting notes and discussions, it appears this > proposal primarily focuses on restructuring the current column statistics > format (moving from multiple maps to a struct-based structure). However, I > couldn't find any specific discussion about handling decimal type scale > evolution. The proposal does make important improvements to the statistics > structure, but it doesn't seem to solve the fundamental issue we've been > discussing - the need to serialize the scale information alongside the > unscaled value to support safe decimal type evolution. Given this, I think > we need to continue discussing potential solutions for decimal scale > changes. The core problem remains: without serializing the scale, we cannot > correctly interpret historical statistics when the decimal type evolves. > > Would love to hear your thoughts on how we should proceed with addressing > this specific issue. > > Minglei > > rice Zhang <minglei...@gmail.com> 于2025年9月11日周四 19:47写道: > >> I couldn't find it in my search - would appreciate any pointers to the >> proposal or related discussions. >> >> Russell Spitzer <russell.spit...@gmail.com> 于2025年9月11日周四 19:32写道: >> >>> This has already been proposed as part of v4, see Edwards column metrics >>> expansion proposal >>> >>> On Thu, Sep 11, 2025 at 4:54 AM rice Zhang <minglei...@gmail.com> wrote: >>> >>>> Hi, Junwang >>>> >>>> We're discussing the storage of lower and upper bounds for decimal >>>> values in manifest files and their compatibility after type evolution. The >>>> bounds are stored as unscaled values without their original scale, so when >>>> the decimal type changes, we can't correctly interpret these historical >>>> bounds even though we know the current type from metadata. >>>> >>>> Minglei. >>>> >>>> Junwang Zhao <zhjw...@gmail.com> 于2025年9月11日周四 17:46写道: >>>> >>>>> Hi Minglei, >>>>> >>>>> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <minglei...@gmail.com> >>>>> wrote: >>>>> > >>>>> > Hi Ryan, >>>>> > >>>>> > Thank you for your detailed response. I've discussed this issue >>>>> offline with my team lead, and we've done some deeper investigation into >>>>> the problem. After reviewing the Decimal Type serialization code in >>>>> Iceberg, we confirmed that currently only the unscaled value is serialized >>>>> without storing the scale value. This indeed makes type evolution more >>>>> complex than initially anticipated. Regarding your mention of v4 adopting >>>>> columnar metadata for manifests, while I'm not certain which specific >>>>> format Iceberg will use (perhaps Parquet?), I agree this is a positive >>>>> direction. However, to properly support decimal scale evolution, I believe >>>>> Iceberg would need to fundamentally change how decimal types are >>>>> serialized, regardless of whether using Avro or Parquet. Specifically, >>>>> we'd >>>>> need to serialize both the unscaled value AND the scale, not just the >>>>> unscaled value. >>>>> > >>>>> > Here's an example: Consider a field initially defined as >>>>> DECIMAL(5,2) with value 123.45 (the serialized unscaled value is 12345). >>>>> If >>>>> a user later changes the type to DECIMAL(6,3) - which follows SQL:2011 >>>>> rules since (p-s) doesn't decrease - reading the old data with the new >>>>> type >>>>> would be problematic. Without the original scale being serialized, we >>>>> can't >>>>> distinguish whether 12345 represents 123.45 (scale=2) or 12.345 (scale=3), >>>>> potentially leading to incorrect data interpretation. By serializing the >>>>> scale alongside the unscaled value, we could correctly read 12345 with >>>>> scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data >>>>> corruption. >>>>> >>>>> The metadata should have the data type, which includes the scale and >>>>> precision, isn't that enough to describe the decimal? Correct me if >>>>> I'm wrong :) >>>>> >>>>> > >>>>> > I'd like to confirm whether this approach of serializing the scale >>>>> value is something you consider viable? Or does the community have other >>>>> better solutions for supporting decimal scale evolution? Also, I'm >>>>> wondering if you've already discussed specific implementation approaches >>>>> for decimal type changes? I'm very interested in understanding how v4 >>>>> plans >>>>> to address this issue. >>>>> > >>>>> > Minglei >>>>> > >>>>> > Ryan Blue <rdb...@gmail.com> 于2025年9月11日周四 03:53写道: >>>>> >> >>>>> >> Hi Minglei, thanks for the proposal. >>>>> >> >>>>> >> v3 is now closed, so we can't introduce a breaking change like this >>>>> until v4. We looked into decimal type evolution in v3 and found that due >>>>> to >>>>> the way that we currently store lower and upper bounds for decimal values, >>>>> we can't safely support this in v3 Iceberg manifests. We will need to wait >>>>> until v4 manifests are introduced with columnar metadata to make this >>>>> change. >>>>> >> >>>>> >> Ryan >>>>> >> >>>>> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <minglei...@gmail.com> >>>>> wrote: >>>>> >>> >>>>> >>> Hi Iceberg Community, >>>>> >>> >>>>> >>> I'd like to propose extending Iceberg's type promotion rules to >>>>> support DECIMAL type evolution with scale changes, aligning with the >>>>> SQL:2011 standard. >>>>> >>> >>>>> >>> Current Limitation >>>>> >>> Currently, Iceberg only supports DECIMAL type promotion when: >>>>> >>> - Scale remains the same >>>>> >>> - Precision can be increased >>>>> >>> >>>>> >>> This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not to >>>>> DECIMAL(12,4). >>>>> >>> >>>>> >>> Proposed Change >>>>> >>> Allow DECIMAL type evolution when: >>>>> >>> 1. Target scale >= source scale >>>>> >>> 2. Target precision >= source precision >>>>> >>> 3. Integer part capacity is preserved: (target_precision - >>>>> target_scale) >= (source_precision - source_scale) >>>>> >>> >>>>> >>> Examples >>>>> >>> With this change: >>>>> >>> - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: 2 >>>>> → 4) >>>>> >>> - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale: >>>>> 2 → 5) >>>>> >>> - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would >>>>> lose integer capacity) >>>>> >>> >>>>> >>> Rationale >>>>> >>> 1. SQL:2011 Compliance: This behavior aligns with SQL:2011 >>>>> standard expectations >>>>> >>> 2. User Experience: Many users coming from traditional databases >>>>> expect this type evolution to work >>>>> >>> 3. Data Safety: The proposed rules ensure no data loss - >>>>> existing values can always be represented in the new >>>>> >>> type >>>>> >>> 4. Real-world Use Cases: Common scenarios like adding more >>>>> decimal precision for currency calculations would >>>>> >>> be supported >>>>> >>> >>>>> >>> Implementation >>>>> >>> I've created a proof-of-concept implementation: >>>>> https://github.com/apache/iceberg/issues/14037 >>>>> >>> >>>>> >>> Questions for Discussion >>>>> >>> 1. Should this be part of the spec v3, or wait for a future >>>>> version? >>>>> >>> 2. Are there any backward compatibility concerns we should >>>>> address? >>>>> >>> >>>>> >>> Looking forward to your feedback and thoughts on this proposal. >>>>> >>> >>>>> >>> Best regards, >>>>> >>> Minglei >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards >>>>> Junwang Zhao >>>>> >>>>