On Thu, Sep 11, 2025 at 5:54 PM rice Zhang <minglei...@gmail.com> wrote: > > Hi, Junwang > > We're discussing the storage of lower and upper bounds for decimal values in > manifest files and their compatibility after type evolution. The bounds are > stored as unscaled values without their original scale, so when the decimal > type changes, we can't correctly interpret these historical bounds even > though we know the current type from metadata.
Ok, that explains, thanks. > > Minglei. > > Junwang Zhao <zhjw...@gmail.com> 于2025年9月11日周四 17:46写道: >> >> Hi Minglei, >> >> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <minglei...@gmail.com> wrote: >> > >> > Hi Ryan, >> > >> > Thank you for your detailed response. I've discussed this issue offline >> > with my team lead, and we've done some deeper investigation into the >> > problem. After reviewing the Decimal Type serialization code in Iceberg, >> > we confirmed that currently only the unscaled value is serialized without >> > storing the scale value. This indeed makes type evolution more complex >> > than initially anticipated. Regarding your mention of v4 adopting columnar >> > metadata for manifests, while I'm not certain which specific format >> > Iceberg will use (perhaps Parquet?), I agree this is a positive direction. >> > However, to properly support decimal scale evolution, I believe Iceberg >> > would need to fundamentally change how decimal types are serialized, >> > regardless of whether using Avro or Parquet. Specifically, we'd need to >> > serialize both the unscaled value AND the scale, not just the unscaled >> > value. >> > >> > Here's an example: Consider a field initially defined as DECIMAL(5,2) with >> > value 123.45 (the serialized unscaled value is 12345). If a user later >> > changes the type to DECIMAL(6,3) - which follows SQL:2011 rules since >> > (p-s) doesn't decrease - reading the old data with the new type would be >> > problematic. Without the original scale being serialized, we can't >> > distinguish whether 12345 represents 123.45 (scale=2) or 12.345 (scale=3), >> > potentially leading to incorrect data interpretation. By serializing the >> > scale alongside the unscaled value, we could correctly read 12345 with >> > scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data >> > corruption. >> >> The metadata should have the data type, which includes the scale and >> precision, isn't that enough to describe the decimal? Correct me if >> I'm wrong :) >> >> > >> > I'd like to confirm whether this approach of serializing the scale value >> > is something you consider viable? Or does the community have other better >> > solutions for supporting decimal scale evolution? Also, I'm wondering if >> > you've already discussed specific implementation approaches for decimal >> > type changes? I'm very interested in understanding how v4 plans to address >> > this issue. >> > >> > Minglei >> > >> > Ryan Blue <rdb...@gmail.com> 于2025年9月11日周四 03:53写道: >> >> >> >> Hi Minglei, thanks for the proposal. >> >> >> >> v3 is now closed, so we can't introduce a breaking change like this until >> >> v4. We looked into decimal type evolution in v3 and found that due to the >> >> way that we currently store lower and upper bounds for decimal values, we >> >> can't safely support this in v3 Iceberg manifests. We will need to wait >> >> until v4 manifests are introduced with columnar metadata to make this >> >> change. >> >> >> >> Ryan >> >> >> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <minglei...@gmail.com> wrote: >> >>> >> >>> Hi Iceberg Community, >> >>> >> >>> I'd like to propose extending Iceberg's type promotion rules to support >> >>> DECIMAL type evolution with scale changes, aligning with the SQL:2011 >> >>> standard. >> >>> >> >>> Current Limitation >> >>> Currently, Iceberg only supports DECIMAL type promotion when: >> >>> - Scale remains the same >> >>> - Precision can be increased >> >>> >> >>> This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not to >> >>> DECIMAL(12,4). >> >>> >> >>> Proposed Change >> >>> Allow DECIMAL type evolution when: >> >>> 1. Target scale >= source scale >> >>> 2. Target precision >= source precision >> >>> 3. Integer part capacity is preserved: (target_precision - >> >>> target_scale) >= (source_precision - source_scale) >> >>> >> >>> Examples >> >>> With this change: >> >>> - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: 2 → 4) >> >>> - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale: 2 → 5) >> >>> - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would lose >> >>> integer capacity) >> >>> >> >>> Rationale >> >>> 1. SQL:2011 Compliance: This behavior aligns with SQL:2011 standard >> >>> expectations >> >>> 2. User Experience: Many users coming from traditional databases >> >>> expect this type evolution to work >> >>> 3. Data Safety: The proposed rules ensure no data loss - existing >> >>> values can always be represented in the new >> >>> type >> >>> 4. Real-world Use Cases: Common scenarios like adding more decimal >> >>> precision for currency calculations would >> >>> be supported >> >>> >> >>> Implementation >> >>> I've created a proof-of-concept implementation: >> >>> https://github.com/apache/iceberg/issues/14037 >> >>> >> >>> Questions for Discussion >> >>> 1. Should this be part of the spec v3, or wait for a future version? >> >>> 2. Are there any backward compatibility concerns we should address? >> >>> >> >>> Looking forward to your feedback and thoughts on this proposal. >> >>> >> >>> Best regards, >> >>> Minglei >> >> >> >> -- >> Regards >> Junwang Zhao -- Regards Junwang Zhao