Hello Vladimir,

I want to raise that we've been observing similar behavior differences
regarding CREATE OR REPLACE between Hive/Hadoop catalog and REST catalog at
here: https://github.com/apache/iceberg/issues/11109

The context: Iceberg spark integration test has traditionally only included
tests against Hive/Hadoop catalog, with the recently added RCK (REST
Capability Kit) setup for testing purpose, we are adding REST catalog based
Spark integration tests.
The assumption: all the Spark integration tests that used to pass on
Hive/Hadoop catalog should also pass on REST catalog, so that we can make
sure REST catalog client has the same behavior and equivalent in power to
Hive/Hadoop catalog client.
Details: https://github.com/apache/iceberg/issues/11079

Currently, there's one such test with CREATE OR REPLACE statement, that can
pass when using Hive/Hadoop catalog, but won't pass when using REST catalog
(server is reference implementation from RCK). Turns out that the CREATE OR
REPLACE statement won't trigger clean up of snapshot history if using
Hive/Hadoop catalog, but it will when using REST catalog.

Based on the discussion above, we should fix some implementation details in
the RCK reference implementation for our issue. Yet these are the kind of
cases where we could benefit from having a general consensus on the
behavior of CREATE OR REPLACE across different catalog types or query
engines.

Another suggestion for Trino: if you already have existing integration
tests on Iceberg connector for Trino for Hive/Hadoop catalog, then just
setting up the exact same tests against REST catalog for Trino connector
can help systematically detect behavior differences between catalog types.

Regards,
Haizhou

On Wed, Oct 23, 2024 at 7:33 AM Vladimir Ozerov <voze...@querifylabs.com>
wrote:

> Hi,
>
> Sure, will do.
>
> *Vladimir Ozerov*
> Founder
> querifylabs.com
>
>
> Ср, 23 окт. 2024 г. в 08:50, Jean-Baptiste Onofré <j...@nanthrax.net>:
>
>> I second Ryan here, it would be great to clarify in the
>> "implementation notes" section.
>>
>> Thanks !
>> Regards
>> JB
>>
>> On Wed, Oct 23, 2024 at 1:10 AM rdb...@gmail.com <rdb...@gmail.com>
>> wrote:
>> >
>> > Thanks Vladimir! Would you like to open a PR to make that change? It
>> sounds like another good item to put into the "Implementation notes"
>> section.
>> >
>> > On Sun, Oct 20, 2024 at 11:41 PM Vladimir Ozerov <
>> voze...@querifylabs.com> wrote:
>> >>
>> >> Hi Jean-Baptiste,
>> >>
>> >> Agreed. REST spec looks good. I am talking about the general spec,
>> where it might be useful to add a hint to engine developers, that CREATE OR
>> REPLACE semantics in Iceberg is expected to follow slightly different
>> semantics. This is already broken in Trino: depending on catalog type users
>> may get either classical "DROP + CREATE" (for non-REST catalogs), or
>> "CREATE AND UPDATE" for REST catalog. For Flink, their official docs say
>> that CREATE OR REPLACE == DROP + CREATE, while for Iceberg tables this
>> should not be the case. These are definitively things that should be fixed
>> at engine levels. But at the same time it highlights that engine developers
>> are having hard time defining proper semantics for CREATE OR REPLACE in the
>> Iceberg integrations, so a paragraph or so in the main Iceberg spec may
>> help us align expectations.
>> >>
>> >> Regards,
>> >> Vladimir.
>> >>
>> >> On Mon, Oct 21, 2024 at 8:28 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>> >>>
>> >>> Hi Vladimir,
>> >>>
>> >>> As Ryan said, it's not a bug: CREATE OR REPLACE can be seen as "CREATE
>> >>> AND UPDATE" from table format perspective. Specifically for the
>> >>> properties, it makes sense to not delete the current properties as it
>> >>> can be used in several use cases (security, tables grouping, ...).
>> >>> I'm not sure a REST Spec update is required, probably more on the
>> >>> engine side. In the REST Spec, you can create a table
>> >>> (
>> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L553
>> )
>> >>> and update a table
>> >>> (
>> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L975
>> ),
>> >>> and it's up to the query engine to implement the "CREATE OR REPLACE"
>> >>> with the correct semantic.
>> >>>
>> >>> Regards
>> >>> JB
>> >>>
>> >>> On Sun, Oct 20, 2024 at 9:26 PM Vladimir Ozerov <
>> voze...@querifylabs.com> wrote:
>> >>> >
>> >>> > Hi Ryan,
>> >>> >
>> >>> > Thanks for the clarification. Yes, I think my confusion was caused
>> by the fact that many engines treat CREATE OR REPLACE as a semantic
>> equivalent of DROP + CREATE, which is performed atomically (e.g., Flink
>> [1]). Table formats add history on top of that, which is expected to be
>> retained, no questions here. Permission propagation also make sense. For
>> properties things become a bit blurry, because on the one hand there are
>> Iceberg specific properties, which may affect table maintenance, and on the
>> other hand there are user-defined properties in the same bag. The question
>> appeared in the first place because I observed a discrepancy in Trino: all
>> catalogs except for REST completely overrides table properties on REPLACE,
>> and REST catalog merges them, which might be confusing to end users.
>> Perhaps some clarification at the spec level might be useful, because
>> without agreement between engines the could be some subtle bugs in
>> multi-engine environments, such as sudden data format changes between
>> replaces, etc.
>> >>> >
>> >>> > [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#create-or-replace-table
>> >>> >
>> >>> > Regards,
>> >>> > Vladimir.
>> >>> >
>> >>> > On Sun, Oct 20, 2024 at 9:20 PM rdb...@gmail.com <rdb...@gmail.com>
>> wrote:
>> >>> >>
>> >>> >> Hi Vladimir,
>> >>> >>
>> >>> >> This isn't a bug. The behavior of CREATE OR REPLACE is to replace
>> the data of a table, but to maintain things like other refs, snapshot
>> history, permissions (if supported by the catalog), and table properties.
>> Table properties are replaced if they are set in the operation like `b` in
>> your example. This is not the same as a drop and create, which may be what
>> you want instead.
>> >>> >>
>> >>> >> The reason for this behavior is that the CREATE OR REPLACE
>> operation is used to replace a table's data without needing to handle
>> schema changes between versions. For example, producing a daily report
>> table that replaces the previous day. However, the table still exists and
>> it is valuable to be able to time travel to older versions or to be able to
>> use branches and tags. Clearly, that means that table history and refs
>> stick around so the table is not completely new every time it is replaced.
>> >>> >>
>> >>> >> Adding on to that, properties control things like ref and snapshot
>> retention, file format, compression, and other settings. These aren't
>> settings that need to be carried through in every replace operation. And it
>> would make no sense if you set the snapshot retention because older
>> snapshots are retained, only to have it discarded the next time you replace
>> the table data. A good way to think about this is that table properties are
>> set infrequently, while table data changes regularly. And the person
>> changing the data may not be the person tuning the table settings.
>> >>> >>
>> >>> >> Hopefully that helps,
>> >>> >>
>> >>> >> Ryan
>> >>> >>
>> >>> >> On Sun, Oct 20, 2024 at 9:45 AM Vladimir Ozerov <
>> voze...@querifylabs.com> wrote:
>> >>> >>>
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> Consider a REST catalog and a user calls "CREATE OR REPLACE
>> <table>" command. When processing the command, engines will usually
>> initiate a "createOrReplace" transaction and add metadata, such as the
>> properties of a new table.
>> >>> >>>
>> >>> >>> Users expect a table to be replaced with a new one if it exists,
>> including properties. However, I observe the following:
>> >>> >>>
>> >>> >>> RESTSessionCatalog loads previous table metadata, adds new
>> properties (MetadataUpdate.SetProperties), and invokes the backend
>> >>> >>> The backend (e.g., Polaris) will typically invoke
>> "CatalogHandler.updateTable." There, the previous table state, including
>> its properties, is loaded
>> >>> >>> Finally, metadata updates are applied, and old table properties
>> are merged with new ones. That is, if the old table has properties [a=1,
>> b=2], and the new table has properties [b=3, c=4], then the final
>> properties would be [a=1, b=3, c=4], while the user expects [b=3, c=4].
>> >>> >>>
>> >>> >>> It looks like a bug because the user expects complete property
>> replacement instead of a merge. Shall we explicitly clear all previous
>> properties in RESTSessionCatalog.Builder.replaceTransaction?
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>> Vladimir.
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> Vladimir Ozerov
>> >>> >>> Founder
>> >>> >>> querifylabs.com
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Vladimir Ozerov
>> >>> > Founder
>> >>> > querifylabs.com
>> >>
>> >>
>> >>
>> >> --
>> >> Vladimir Ozerov
>> >> Founder
>> >> querifylabs.com
>>
>

Reply via email to