Hey Drew, First of all, sorry for letting this linger for so long. I think the right path forward is similar to what you suggested:
We've been writing literal booleans for a while, but the spec says it > should be an object. Clients may have implemented the spec, so they expect > the aforementioned object form. The previous consensus was to accept both > forms when reading, and keep writing as boolean literals. Either way, we > know some clients will need updates. Since the code is already out there, and we can't break existing users, my suggestion would be: keep producing everything as an object, but add an implementation notes that some clients might produce true/false literals and those should be accepted for historical reasons. I have a slight preference of keeping everything an object as it aligns nicer with the open-api standard. Leading to simpler implementations (at least using Pydantic), and it will also allow us to use things like the open-api discriminator <https://swagger.io/docs/specification/v3_0/data-models/inheritance-and-polymorphism/#discriminator> . Kind regards, Fokko Op di 20 jan 2026 om 23:42 schreef Drew <[email protected]>: > Hi all, > > Following up after letting this sit for a bit. I’m still leaning toward > option 1, and I think that approach is reasonable here. > > The spec change toward primitives is cleaner as it aligns with what we’ve > effectively been doing already by writing boolean literals. If the spec > modeling were causing real issues, I would've expected to see some issues > earlier, given how long this logic has existed. So far, I’ve only seen that > single report I referenced in the last email. > > Given that, I’d treat this as a documentation alignment rather than > something that needs immediate code changes. If someone does run into > problems with the models, we can address that in a future release cycle > without blocking progress. > > Thanks, > Drew > > On Sat, Jan 17, 2026 at 3:57 PM Drew <[email protected]> wrote: > >> Hi all, >> >> I wanted to revive this thread after the same expression problem came up >> again in a recent issue (https://github.com/apache/iceberg/issues/15072), >> this time while using the expressions model for reportMetrics API. >> >> We've been writing literal booleans for a while, but the spec says it >> should be an object. Clients may have implemented the spec, so they expect >> the aforementioned object form. The previous consensus was to accept both >> forms when reading, and keep writing as boolean literals. Either way, we >> know some clients will need updates. >> >> Would love to hear thoughts on whether this approach still makes sense. >> >> PR: https://github.com/apache/iceberg/pull/14677 >> >> Drew >> >> On Wed, Dec 3, 2025 at 7:31 PM Drew <[email protected]> wrote: >> >>> Hey Everyone, >>> >>> Quick update on the boolean expression issue in this PR 14677 >>> <https://github.com/apache/iceberg/pull/14677>. >>> >>> This showed up while working on scan planning, but expressions are used >>> in other areas of the REST spec as well. Since the expression parser has >>> always written boolean literals, there are some users who have relied on >>> that behavior without ever using the REST models. Given that, I don't think >>> there's a path here that avoids breaking someone. >>> >>> Even if we start accepting the object form from the spec, users still >>> need to update their models to handle the parser's current behavior. And if >>> we flip the wire format to match the spec, then any client that's been >>> deserializing boolean will need to update. Either direction comes with a >>> breaking change. >>> >>> Here are the some paths forward: >>> >>> *Option 1*: Align the spec with what's actually been written. >>> >>> Pros: Matches existing behavior that clients already rely on >>> Cons: Clients that implemented the object form from the spec will need a >>> small update >>> >>> *Option 2*: Align the implementation with the published spec. >>> >>> Pros: Matches the current wording in the spec >>> Cons: Breaks clients that only expect boolean literals today >>> >>> *Option 3: *Keep writing boolean literals, but read both. >>> >>> Pros: No change to what we write. Accepts both the spec object form >>> {"type":"true"} and true >>> Cons: Still requires a spec update for reads from the client, and adds >>> extra logic to parser. >>> >>> I'm leaning more towards option one as booleans are what we have been >>> writing since the beginning and any client that has been following the spec >>> today can update their models to follow this behavior. >>> >>> Let me know what you all think! >>> >>> Thanks, >>> Drew >>> >>> On Tue, Nov 25, 2025 at 1:09 AM Fokko Driesprong <[email protected]> >>> wrote: >>> >>>> Thanks, Drew, for finding and fixing this! We should definitely remove >>>> this discrepancy. I've replied to the PR. >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> Op di 25 nov 2025 om 08:23 schreef Drew <[email protected]>: >>>> >>>>> Hi all, >>>>> >>>>> I ran into an issue using the REST scan planning APIs where filters >>>>> containing the boolean expressions were failing to be parsed. The REST >>>>> spec >>>>> defines these models as an object wrapping the string representation like >>>>> {"type": "true"}, but the ExpressionParser actually read and writes them >>>>> as >>>>> plain booleans. That mismatch causes the parser to reject filters that >>>>> follow the current spec. >>>>> >>>>> I opened a PR to update the REST spec to align with how the expression >>>>> parser is used. >>>>> >>>>> If anyone has any concerns with the spec change, or thinks we should >>>>> handle it differently (for example by changing the Java representation >>>>> instead), I’d appreciate any feedback. >>>>> >>>>> PR: https://github.com/apache/iceberg/pull/14677 >>>>> >>>>> - Drew >>>>> >>>>
