Hi all,

I've just opened a PR [1] that implements event type IDs, as discussed
here, and also contextual information for events, as discussed in
another discussion thread [2].

The reason why I openen one PR for both is that both enhancements
affect exactly the same classes and 2 PRs would conflict with each
other and force reviewers to review the same places twice.

Thanks,
Alex


[1]: https://github.com/apache/polaris/pull/2998
[2]: https://lists.apache.org/thread/rl5cpcft16sn5n00mfkmx9ldn3gsqtfy

On Tue, Oct 28, 2025 at 5:51 PM Alexandre Dutra <[email protected]> wrote:
>
> Hi all,
>
> I am going to assume that we have reached a consensus on this topic
> and start working on it.
>
> There seems to remain some minor friction around the ID format but I
> believe we can sort this out by looking at the implementation.
>
> If someone disagrees, please chime in.
>
> Thanks,
> Alex
>
> On Sun, Oct 26, 2025 at 12:49 PM Alexandre Dutra <[email protected]> wrote:
> >
> > Hi all,
> >
> > I am on the fence regarding simple class names. We should not tie
> > Polaris internals to anything that is user-facing, like configuration
> > or REST API parameters. That's brittle. Someone could rename the
> > classes one day, and that would break everything. Also, Java cannot
> > enforce the uniqueness of simple class names, so someone could
> > introduce a duplicated event class inadvertently. IMO, event type IDs
> > are much more future-proof, especially because Java **can** enforce
> > their uniqueness, if we use a single enum for that.
> >
> > > Regarding the ID format, I’d lean toward using self-descriptive string 
> > > IDs (e.g., BEFORE_CREATE_CATALOG, AFTER_REFRESH_TABLE, or other formats 
> > > "before-create-catalog", "BeforeCreateCatalog") instead of numeric ones.
> >
> > Completely agree. I don't mind using numeric codes in the database –
> > that's not user-facing and storing a byte or int32 saves space. But in
> > configuration or REST APIs, it's a different story, we cannot possibly
> > require users to input numeric codes. I'd note that this is basically
> > what Polaris does already. Take PolarisPrivilege [1] as an example:
> > the SQL schema stores numeric codes [2], but the REST API accepts enum
> > constant names [3].
> >
> > Thanks,
> > Alex
> >
> > [1]: 
> > https://github.com/apache/polaris/blob/41175512655cdb5561b8952a3bf93588185bc536/polaris-core/src/main/java/org/apache/polaris/core/entity/PolarisPrivilege.java
> > [2]: 
> > https://github.com/apache/polaris/blob/e02bb7171ba7af1aeba224e4d5fd585ea6c88624/persistence/relational-jdbc/src/main/resources/postgres/schema-v3.sql#L87
> > [3]: 
> > https://github.com/apache/polaris/blob/cc7bb953cb80e2d52ceb74a06ee8e7b6e644eefa/spec/polaris-management-service.yml#L1558-L1608
> >
> > On Sat, Oct 25, 2025 at 6:13 PM Yufei Gu <[email protected]> wrote:
> > >
> > > Class name without path sounds good to me. Spark takes the same route.
> > >
> > > Yufei
> > >
> > >
> > > On Sat, Oct 25, 2025 at 1:32 AM Adnan Hemani
> > > <[email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Re schema migration: I’m a +1 with Dmitri’s suggestion to just disable
> > > > event persistence and throw error logs until the schema is upgraded.
> > > >
> > > > Re string IDs: If we wish to just use the simple class name (just the 
> > > > name
> > > > to the Java class, without the path to the class) - I’m still okay with
> > > > that, as that was the original thought process and what’s present 
> > > > today. Or
> > > > if we’d like to use an integer EventTypeId, that makes sense to me too. 
> > > > I
> > > > just don’t see any value in doing anything in between (i.e. making new
> > > > string identifiers for each event type, etc.). IMO we will have just 
> > > > end up
> > > > with the same amount of brittleness that currently exists, now with an
> > > > additional mapping in between with it.
> > > >
> > > > Best,
> > > > Adnan Hemani
> > > >
> > > > > On Oct 24, 2025, at 10:34 AM, Yufei Gu <[email protected]> wrote:
> > > > >
> > > > > Alex, thanks for raising the discussion. I believe it's the right
> > > > direction
> > > > > to avoid using class names for event types.
> > > > >
> > > > > Regarding the ID format, I’d lean toward using self-descriptive string
> > > > IDs
> > > > > (e.g., BEFORE_CREATE_CATALOG, AFTER_REFRESH_TABLE, or other formats
> > > > > "before-create-catalog", "BeforeCreateCatalog") instead of numeric 
> > > > > ones.
> > > > > String IDs are more stable and expressive, making them easier to read 
> > > > > in
> > > > > configurations, logs, and dashboards. They also decouple user-facing
> > > > > configuration from internal implementation details and avoid the
> > > > > brittleness of numeric “magic numbers.”
> > > > >
> > > > > If we ever need numeric codes for storage or performance reasons, they
> > > > > could exist as internal representation, but the string ID should 
> > > > > remain
> > > > the
> > > > > canonical public identifier used in APIs, configs, and event
> > > > serialization.
> > > > >
> > > > > Yufei
> > > > >
> > > > >
> > > > > On Fri, Oct 24, 2025 at 9:04 AM Alexandre Dutra <[email protected]
> > > > <mailto:[email protected]>> wrote:
> > > > >
> > > > >>> I do not think we have technical means to force JDBC schema
> > > > >> changes on upgrade.
> > > > >>
> > > > >> Would a warning under "Upgrade notes" in the changelog be enough?
> > > > >>
> > > > >> On Fri, Oct 24, 2025 at 5:53 PM Dmitri Bourlatchkov 
> > > > >> <[email protected]>
> > > > >> wrote:
> > > > >>>
> > > > >>> Hi All,
> > > > >>>
> > > > >>> From my POV we should flag events as "beta" in 1.2.0 release notes 
> > > > >>> and
> > > > >>> drop/re-create events tables on upgrade.
> > > > >>>
> > > > >>> However, I do not think we have technical means to force JDBC schema
> > > > >>> changes on upgrade. At best we can disable event persistence (with
> > > > ERROR
> > > > >>> log messages) if the schema is old and wait for the user to run the
> > > > admin
> > > > >>> tool to apply schema changes.
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Dmitri.
> > > > >>>
> > > > >>> On Fri, Oct 24, 2025 at 5:26 AM Alexandre Dutra <[email protected]>
> > > > >> wrote:
> > > > >>>
> > > > >>>> Hi all,
> > > > >>>>
> > > > >>>> Thank you for your input here. Mike and Adnan: I'm fine with 
> > > > >>>> numerical
> > > > >>>> ids in the database, let's keep using this pattern.
> > > > >>>>
> > > > >>>> I think the only question remaining is how to handle the SQL schema
> > > > >>>> migration. Currently we have an event_type column of type TEXT NOT
> > > > >>>> NULL [1]. We would need instead an event_type_code column of type
> > > > >>>> INTEGER NOT NULL.
> > > > >>>>
> > > > >>>> Like Dmitri, I confess I do not recall if the Events API is 
> > > > >>>> considered
> > > > >>>> beta or production-ready at this point. Also, the events table is
> > > > >>>> probably meant to be purged regularly. Do you think it would be OK 
> > > > >>>> to
> > > > >>>> just drop the existing column and introduce a new one, also
> > > > >>>> instructing users to drop and re-create the table when upgrading
> > > > >>>> Polaris?
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Alex
> > > > >>>>
> > > > >>>> [1]:
> > > > >>>>
> > > > >>
> > > > https://www.google.com/url?q=https://github.com/apache/polaris/blob/e02bb7171ba7af1aeba224e4d5fd585ea6c88624/persistence/relational-jdbc/src/main/resources/postgres/schema-v3.sql%23L129&source=gmail-imap&ust=1761932117000000&usg=AOvVaw1oJk-illh5w1s-tuGgiBYg
> > > > >>>>
> > > > >>>> On Fri, Oct 24, 2025 at 9:48 AM Adnan Hemani
> > > > >>>> <[email protected] <mailto:
> > > > [email protected]>> wrote:
> > > > >>>>>
> > > > >>>>> Hi Alex,
> > > > >>>>>
> > > > >>>>> I agree with your concerns. Here is what I am recommending:
> > > > >>>>> * +1 to Michael’s recommendation of keeping numeric IDs for 
> > > > >>>>> storage
> > > > >>>> purposes.
> > > > >>>>> * Waiting on the logical IDs until we have a use case for it. I’m 
> > > > >>>>> not
> > > > >>>> sure the configurations use case is fully defined yet (if you 
> > > > >>>> think it
> > > > >> is,
> > > > >>>> happy to discuss on a different mailing thread so as to not hijack
> > > > this
> > > > >>>> thread). But we should add a new method regardless to the 
> > > > >>>> PolarisEvent
> > > > >>>> interface: `int typeId()` or something similar - and use that 
> > > > >>>> forward
> > > > >> in
> > > > >>>> storage.
> > > > >>>>>
> > > > >>>>> Best,
> > > > >>>>> Adnan Hemani
> > > > >>>>>
> > > > >>>>>> On Oct 23, 2025, at 9:23 AM, Michael Collado <
> > > > >> [email protected] <mailto:[email protected]>>
> > > > >>>> wrote:
> > > > >>>>>>
> > > > >>>>>> Totally agree. In fact, other enum types (e.g., PolarisPrivilege,
> > > > >>>>>> PolarisEntityType) use numeric ids in storage to decouple naming 
> > > > >>>>>> in
> > > > >>>> code
> > > > >>>>>> from the values we persist in the database. I'd recommend
> > > > >> continuing
> > > > >>>> with
> > > > >>>>>> that pattern.
> > > > >>>>>>
> > > > >>>>>> Mike
> > > > >>>>>>
> > > > >>>>>> On Wed, Oct 22, 2025 at 10:38 AM Dmitri Bourlatchkov <
> > > > >> [email protected] <mailto:[email protected]>
> > > > >>>> <mailto:[email protected]>>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Hi Alex,
> > > > >>>>>>>
> > > > >>>>>>> +1 to all three counts.
> > > > >>>>>>>
> > > > >>>>>>> As for the SQL migration script, is event persistence a fully
> > > > >>>> supported
> > > > >>>>>>> feature or a "beta"... TBH, I do not recall how it was presented
> > > > >> to
> > > > >>>> users.
> > > > >>>>>>> WDYT?
> > > > >>>>>>>
> > > > >>>>>>> Cheers,
> > > > >>>>>>> Dmitri.
> > > > >>>>>>>
> > > > >>>>>>> On Wed, Oct 22, 2025 at 9:44 AM Alexandre Dutra <
> > > > >> [email protected] <mailto:[email protected]>>
> > > > >>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi all,
> > > > >>>>>>>>
> > > > >>>>>>>> Currently, Polaris manages over 150 event types, but they lack
> > > > >> proper
> > > > >>>>>>>> identifiers, leading to the problematic use of Java class names
> > > > >> to
> > > > >>>>>>>> represent the event types.
> > > > >>>>>>>>
> > > > >>>>>>>> For instance, PolarisPersistenceEventListener relies on Java
> > > > >> class
> > > > >>>>>>>> names to store event types in the database [1]. Thus, renaming 
> > > > >>>>>>>> or
> > > > >>>>>>>> moving an event class could inadvertently break this listener,
> > > > >> or any
> > > > >>>>>>>> other listener doing something similar.
> > > > >>>>>>>>
> > > > >>>>>>>> Moreover, if we were to introduce a configuration to manage 
> > > > >>>>>>>> event
> > > > >>>>>>>> types (which I think we'll have to do eventually), we'd need to
> > > > >> use
> > > > >>>>>>>> Java class names again, out of better options. Consider a
> > > > >>>> hypothetical
> > > > >>>>>>>> configuration example that highlights this issue:
> > > > >>>>>>>>
> > > > >>>>>>>> polaris.event-listener.enabled-event-types=\
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>
> > > > >>
> > > > org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterRefreshTableEvent,\
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>
> > > > >>
> > > > org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterCommitTableEvent
> > > > >>>>>>>>
> > > > >>>>>>>> It's obvious that listing event types in such a way is not only
> > > > >>>>>>>> verbose, but also cryptic for users. It introduces an 
> > > > >>>>>>>> unnecessary
> > > > >>>>>>>> tight coupling between user-facing configuration and Polaris
> > > > >>>>>>>> internals.
> > > > >>>>>>>>
> > > > >>>>>>>> To address these concerns, I would like to promote event types 
> > > > >>>>>>>> to
> > > > >>>>>>>> first-class citizens in Polaris:
> > > > >>>>>>>>
> > > > >>>>>>>> 1. Introduce the concept of an event type with a unique, 
> > > > >>>>>>>> logical
> > > > >> ID
> > > > >>>>>>>> (e.g., BEFORE_CREATE_CATALOG). This could be enforced through a
> > > > >> Java
> > > > >>>>>>>> enum for uniqueness.
> > > > >>>>>>>> 2. Add a new method, PolarisEventType type(), to the 
> > > > >>>>>>>> PolarisEvent
> > > > >>>>>>>> interface. This is not only informative; it could serve e.g. 
> > > > >>>>>>>> as a
> > > > >>>> type
> > > > >>>>>>>> discriminator when serializing events.
> > > > >>>>>>>> 3. Replace class names with logical IDs, both in configuration
> > > > >> and
> > > > >>>>>>>> when storing events in the database.
> > > > >>>>>>>>
> > > > >>>>>>>> Note: The last item represents a breaking change and may 
> > > > >>>>>>>> require
> > > > >> a
> > > > >>>> SQL
> > > > >>>>>>>> migration script.
> > > > >>>>>>>>
> > > > >>>>>>>> I'm curious to hear your thoughts.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks,
> > > > >>>>>>>> Alex
> > > > >>>>>>>>
> > > > >>>>>>>> [1]:
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>
> > > > >>
> > > > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/blob/3b4b0169032216d85d48130ad43285f86e886e3c/runtime/service/src/main/java/org/apache/polaris/service/events/listeners/PolarisPersistenceEventListener.java%2523L44%26source%3Dgmail-imap%26ust%3D1761841473000000%26usg%3DAOvVaw0v18HDrE4mH9C3INPPhtLp&source=gmail-imap&ust=1761932117000000&usg=AOvVaw1trohcEgBdAL0XvyqjTrmt
> > > >
> > > >

Reply via email to