Hey Drew,

First of all, sorry for letting this linger for so long. I think the right
path forward is similar to what you suggested:

We've been writing literal booleans for a while, but the spec says it
> should be an object. Clients may have implemented the spec, so they expect
> the aforementioned object form. The previous consensus was to accept both
> forms when reading, and keep writing as boolean literals. Either way, we
> know some clients will need updates.


Since the code is already out there, and we can't break existing users, my
suggestion would be: keep producing everything as an object, but add an
implementation notes that some clients might produce true/false literals
and those should be accepted for historical reasons.

I have a slight preference of keeping everything an object as it aligns
nicer with the open-api standard. Leading to simpler implementations (at
least using Pydantic), and it will also allow us to use things like
the open-api
discriminator
<https://swagger.io/docs/specification/v3_0/data-models/inheritance-and-polymorphism/#discriminator>
.

Kind regards,
Fokko

Op di 20 jan 2026 om 23:42 schreef Drew <[email protected]>:

> Hi all,
>
> Following up after letting this sit for a bit. I’m still leaning toward
> option 1, and I think that approach is reasonable here.
>
> The spec change toward primitives is cleaner as it aligns with what we’ve
> effectively been doing already by writing boolean literals. If the spec
> modeling were causing real issues, I would've expected to see some issues
> earlier, given how long this logic has existed. So far, I’ve only seen that
> single report I referenced in the last email.
>
> Given that, I’d treat this as a documentation alignment rather than
> something that needs immediate code changes. If someone does run into
> problems with the models, we can address that in a future release cycle
> without blocking progress.
>
> Thanks,
> Drew
>
> On Sat, Jan 17, 2026 at 3:57 PM Drew <[email protected]> wrote:
>
>> Hi all,
>>
>> I wanted to revive this thread after the same expression problem came up
>> again in a recent issue  (https://github.com/apache/iceberg/issues/15072),
>> this time while using the expressions model for reportMetrics API.
>>
>> We've been writing literal booleans for a while, but the spec says it
>> should be an object. Clients may have implemented the spec, so they expect
>> the aforementioned object form. The previous consensus was to accept both
>> forms when reading, and keep writing as boolean literals. Either way, we
>> know some clients will need updates.
>>
>> Would love to hear thoughts on whether this approach still makes sense.
>>
>> PR: https://github.com/apache/iceberg/pull/14677
>>
>> Drew
>>
>> On Wed, Dec 3, 2025 at 7:31 PM Drew <[email protected]> wrote:
>>
>>> Hey Everyone,
>>>
>>> Quick update on the boolean expression issue in this PR 14677
>>> <https://github.com/apache/iceberg/pull/14677>.
>>>
>>> This showed up while working on scan planning, but expressions are used
>>> in other areas of the REST spec as well. Since the expression parser has
>>> always written boolean literals, there are some users who have relied on
>>> that behavior without ever using the REST models. Given that, I don't think
>>> there's a path here that avoids breaking someone.
>>>
>>> Even if we start accepting the object form from the spec, users still
>>> need to update their models to handle the parser's current behavior. And if
>>> we flip the wire format to match the spec, then any client that's been
>>> deserializing boolean will need to update. Either direction comes with a
>>> breaking change.
>>>
>>> Here are the some paths forward:
>>>
>>> *Option 1*: Align the spec with what's actually been written.
>>>
>>> Pros: Matches existing behavior that clients already rely on
>>> Cons: Clients that implemented the object form from the spec will need a
>>> small update
>>>
>>> *Option 2*: Align the implementation with the published spec.
>>>
>>> Pros: Matches the current wording in the spec
>>> Cons: Breaks clients that only expect boolean literals today
>>>
>>> *Option 3: *Keep writing boolean literals, but read both.
>>>
>>> Pros: No change to what we write. Accepts both the spec object form
>>> {"type":"true"} and true
>>> Cons: Still requires a spec update for reads from the client, and adds
>>> extra logic to parser.
>>>
>>> I'm leaning more towards option one as booleans are what we have been
>>> writing since the beginning and any client that has been following the spec
>>> today can update their models to follow this behavior.
>>>
>>> Let me know what you all think!
>>>
>>> Thanks,
>>> Drew
>>>
>>> On Tue, Nov 25, 2025 at 1:09 AM Fokko Driesprong <[email protected]>
>>> wrote:
>>>
>>>> Thanks, Drew, for finding and fixing this! We should definitely remove
>>>> this discrepancy. I've replied to the PR.
>>>>
>>>> Kind regards,
>>>> Fokko
>>>>
>>>> Op di 25 nov 2025 om 08:23 schreef Drew <[email protected]>:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I ran into an issue using the REST scan planning APIs where filters
>>>>> containing the boolean expressions were failing to be parsed. The REST 
>>>>> spec
>>>>> defines these models as an object wrapping the string representation like
>>>>> {"type": "true"}, but the ExpressionParser actually read and writes them 
>>>>> as
>>>>> plain booleans. That mismatch causes the parser to reject filters that
>>>>> follow the current spec.
>>>>>
>>>>> I opened a PR to update the REST spec to align with how the expression
>>>>> parser is used.
>>>>>
>>>>> If anyone has any concerns with the spec change, or thinks we should
>>>>> handle it differently (for example by changing the Java representation
>>>>> instead), I’d appreciate any feedback.
>>>>>
>>>>> PR: https://github.com/apache/iceberg/pull/14677
>>>>>
>>>>> - Drew
>>>>>
>>>>

Reply via email to