Hi all,

I am going to assume that we have reached a consensus on this topic
and start working on it.

There seems to remain some minor friction around the ID format but I
believe we can sort this out by looking at the implementation.

If someone disagrees, please chime in.

Thanks,
Alex

On Sun, Oct 26, 2025 at 12:49 PM Alexandre Dutra <[email protected]> wrote:
>
> Hi all,
>
> I am on the fence regarding simple class names. We should not tie
> Polaris internals to anything that is user-facing, like configuration
> or REST API parameters. That's brittle. Someone could rename the
> classes one day, and that would break everything. Also, Java cannot
> enforce the uniqueness of simple class names, so someone could
> introduce a duplicated event class inadvertently. IMO, event type IDs
> are much more future-proof, especially because Java **can** enforce
> their uniqueness, if we use a single enum for that.
>
> > Regarding the ID format, I’d lean toward using self-descriptive string IDs 
> > (e.g., BEFORE_CREATE_CATALOG, AFTER_REFRESH_TABLE, or other formats 
> > "before-create-catalog", "BeforeCreateCatalog") instead of numeric ones.
>
> Completely agree. I don't mind using numeric codes in the database –
> that's not user-facing and storing a byte or int32 saves space. But in
> configuration or REST APIs, it's a different story, we cannot possibly
> require users to input numeric codes. I'd note that this is basically
> what Polaris does already. Take PolarisPrivilege [1] as an example:
> the SQL schema stores numeric codes [2], but the REST API accepts enum
> constant names [3].
>
> Thanks,
> Alex
>
> [1]: 
> https://github.com/apache/polaris/blob/41175512655cdb5561b8952a3bf93588185bc536/polaris-core/src/main/java/org/apache/polaris/core/entity/PolarisPrivilege.java
> [2]: 
> https://github.com/apache/polaris/blob/e02bb7171ba7af1aeba224e4d5fd585ea6c88624/persistence/relational-jdbc/src/main/resources/postgres/schema-v3.sql#L87
> [3]: 
> https://github.com/apache/polaris/blob/cc7bb953cb80e2d52ceb74a06ee8e7b6e644eefa/spec/polaris-management-service.yml#L1558-L1608
>
> On Sat, Oct 25, 2025 at 6:13 PM Yufei Gu <[email protected]> wrote:
> >
> > Class name without path sounds good to me. Spark takes the same route.
> >
> > Yufei
> >
> >
> > On Sat, Oct 25, 2025 at 1:32 AM Adnan Hemani
> > <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > Re schema migration: I’m a +1 with Dmitri’s suggestion to just disable
> > > event persistence and throw error logs until the schema is upgraded.
> > >
> > > Re string IDs: If we wish to just use the simple class name (just the name
> > > to the Java class, without the path to the class) - I’m still okay with
> > > that, as that was the original thought process and what’s present today. 
> > > Or
> > > if we’d like to use an integer EventTypeId, that makes sense to me too. I
> > > just don’t see any value in doing anything in between (i.e. making new
> > > string identifiers for each event type, etc.). IMO we will have just end 
> > > up
> > > with the same amount of brittleness that currently exists, now with an
> > > additional mapping in between with it.
> > >
> > > Best,
> > > Adnan Hemani
> > >
> > > > On Oct 24, 2025, at 10:34 AM, Yufei Gu <[email protected]> wrote:
> > > >
> > > > Alex, thanks for raising the discussion. I believe it's the right
> > > direction
> > > > to avoid using class names for event types.
> > > >
> > > > Regarding the ID format, I’d lean toward using self-descriptive string
> > > IDs
> > > > (e.g., BEFORE_CREATE_CATALOG, AFTER_REFRESH_TABLE, or other formats
> > > > "before-create-catalog", "BeforeCreateCatalog") instead of numeric ones.
> > > > String IDs are more stable and expressive, making them easier to read in
> > > > configurations, logs, and dashboards. They also decouple user-facing
> > > > configuration from internal implementation details and avoid the
> > > > brittleness of numeric “magic numbers.”
> > > >
> > > > If we ever need numeric codes for storage or performance reasons, they
> > > > could exist as internal representation, but the string ID should remain
> > > the
> > > > canonical public identifier used in APIs, configs, and event
> > > serialization.
> > > >
> > > > Yufei
> > > >
> > > >
> > > > On Fri, Oct 24, 2025 at 9:04 AM Alexandre Dutra <[email protected]
> > > <mailto:[email protected]>> wrote:
> > > >
> > > >>> I do not think we have technical means to force JDBC schema
> > > >> changes on upgrade.
> > > >>
> > > >> Would a warning under "Upgrade notes" in the changelog be enough?
> > > >>
> > > >> On Fri, Oct 24, 2025 at 5:53 PM Dmitri Bourlatchkov <[email protected]>
> > > >> wrote:
> > > >>>
> > > >>> Hi All,
> > > >>>
> > > >>> From my POV we should flag events as "beta" in 1.2.0 release notes and
> > > >>> drop/re-create events tables on upgrade.
> > > >>>
> > > >>> However, I do not think we have technical means to force JDBC schema
> > > >>> changes on upgrade. At best we can disable event persistence (with
> > > ERROR
> > > >>> log messages) if the schema is old and wait for the user to run the
> > > admin
> > > >>> tool to apply schema changes.
> > > >>>
> > > >>> Cheers,
> > > >>> Dmitri.
> > > >>>
> > > >>> On Fri, Oct 24, 2025 at 5:26 AM Alexandre Dutra <[email protected]>
> > > >> wrote:
> > > >>>
> > > >>>> Hi all,
> > > >>>>
> > > >>>> Thank you for your input here. Mike and Adnan: I'm fine with 
> > > >>>> numerical
> > > >>>> ids in the database, let's keep using this pattern.
> > > >>>>
> > > >>>> I think the only question remaining is how to handle the SQL schema
> > > >>>> migration. Currently we have an event_type column of type TEXT NOT
> > > >>>> NULL [1]. We would need instead an event_type_code column of type
> > > >>>> INTEGER NOT NULL.
> > > >>>>
> > > >>>> Like Dmitri, I confess I do not recall if the Events API is 
> > > >>>> considered
> > > >>>> beta or production-ready at this point. Also, the events table is
> > > >>>> probably meant to be purged regularly. Do you think it would be OK to
> > > >>>> just drop the existing column and introduce a new one, also
> > > >>>> instructing users to drop and re-create the table when upgrading
> > > >>>> Polaris?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Alex
> > > >>>>
> > > >>>> [1]:
> > > >>>>
> > > >>
> > > https://www.google.com/url?q=https://github.com/apache/polaris/blob/e02bb7171ba7af1aeba224e4d5fd585ea6c88624/persistence/relational-jdbc/src/main/resources/postgres/schema-v3.sql%23L129&source=gmail-imap&ust=1761932117000000&usg=AOvVaw1oJk-illh5w1s-tuGgiBYg
> > > >>>>
> > > >>>> On Fri, Oct 24, 2025 at 9:48 AM Adnan Hemani
> > > >>>> <[email protected] <mailto:
> > > [email protected]>> wrote:
> > > >>>>>
> > > >>>>> Hi Alex,
> > > >>>>>
> > > >>>>> I agree with your concerns. Here is what I am recommending:
> > > >>>>> * +1 to Michael’s recommendation of keeping numeric IDs for storage
> > > >>>> purposes.
> > > >>>>> * Waiting on the logical IDs until we have a use case for it. I’m 
> > > >>>>> not
> > > >>>> sure the configurations use case is fully defined yet (if you think 
> > > >>>> it
> > > >> is,
> > > >>>> happy to discuss on a different mailing thread so as to not hijack
> > > this
> > > >>>> thread). But we should add a new method regardless to the 
> > > >>>> PolarisEvent
> > > >>>> interface: `int typeId()` or something similar - and use that forward
> > > >> in
> > > >>>> storage.
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Adnan Hemani
> > > >>>>>
> > > >>>>>> On Oct 23, 2025, at 9:23 AM, Michael Collado <
> > > >> [email protected] <mailto:[email protected]>>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>> Totally agree. In fact, other enum types (e.g., PolarisPrivilege,
> > > >>>>>> PolarisEntityType) use numeric ids in storage to decouple naming in
> > > >>>> code
> > > >>>>>> from the values we persist in the database. I'd recommend
> > > >> continuing
> > > >>>> with
> > > >>>>>> that pattern.
> > > >>>>>>
> > > >>>>>> Mike
> > > >>>>>>
> > > >>>>>> On Wed, Oct 22, 2025 at 10:38 AM Dmitri Bourlatchkov <
> > > >> [email protected] <mailto:[email protected]>
> > > >>>> <mailto:[email protected]>>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hi Alex,
> > > >>>>>>>
> > > >>>>>>> +1 to all three counts.
> > > >>>>>>>
> > > >>>>>>> As for the SQL migration script, is event persistence a fully
> > > >>>> supported
> > > >>>>>>> feature or a "beta"... TBH, I do not recall how it was presented
> > > >> to
> > > >>>> users.
> > > >>>>>>> WDYT?
> > > >>>>>>>
> > > >>>>>>> Cheers,
> > > >>>>>>> Dmitri.
> > > >>>>>>>
> > > >>>>>>> On Wed, Oct 22, 2025 at 9:44 AM Alexandre Dutra <
> > > >> [email protected] <mailto:[email protected]>>
> > > >>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi all,
> > > >>>>>>>>
> > > >>>>>>>> Currently, Polaris manages over 150 event types, but they lack
> > > >> proper
> > > >>>>>>>> identifiers, leading to the problematic use of Java class names
> > > >> to
> > > >>>>>>>> represent the event types.
> > > >>>>>>>>
> > > >>>>>>>> For instance, PolarisPersistenceEventListener relies on Java
> > > >> class
> > > >>>>>>>> names to store event types in the database [1]. Thus, renaming or
> > > >>>>>>>> moving an event class could inadvertently break this listener,
> > > >> or any
> > > >>>>>>>> other listener doing something similar.
> > > >>>>>>>>
> > > >>>>>>>> Moreover, if we were to introduce a configuration to manage event
> > > >>>>>>>> types (which I think we'll have to do eventually), we'd need to
> > > >> use
> > > >>>>>>>> Java class names again, out of better options. Consider a
> > > >>>> hypothetical
> > > >>>>>>>> configuration example that highlights this issue:
> > > >>>>>>>>
> > > >>>>>>>> polaris.event-listener.enabled-event-types=\
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > > org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterRefreshTableEvent,\
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > > org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterCommitTableEvent
> > > >>>>>>>>
> > > >>>>>>>> It's obvious that listing event types in such a way is not only
> > > >>>>>>>> verbose, but also cryptic for users. It introduces an unnecessary
> > > >>>>>>>> tight coupling between user-facing configuration and Polaris
> > > >>>>>>>> internals.
> > > >>>>>>>>
> > > >>>>>>>> To address these concerns, I would like to promote event types to
> > > >>>>>>>> first-class citizens in Polaris:
> > > >>>>>>>>
> > > >>>>>>>> 1. Introduce the concept of an event type with a unique, logical
> > > >> ID
> > > >>>>>>>> (e.g., BEFORE_CREATE_CATALOG). This could be enforced through a
> > > >> Java
> > > >>>>>>>> enum for uniqueness.
> > > >>>>>>>> 2. Add a new method, PolarisEventType type(), to the PolarisEvent
> > > >>>>>>>> interface. This is not only informative; it could serve e.g. as a
> > > >>>> type
> > > >>>>>>>> discriminator when serializing events.
> > > >>>>>>>> 3. Replace class names with logical IDs, both in configuration
> > > >> and
> > > >>>>>>>> when storing events in the database.
> > > >>>>>>>>
> > > >>>>>>>> Note: The last item represents a breaking change and may require
> > > >> a
> > > >>>> SQL
> > > >>>>>>>> migration script.
> > > >>>>>>>>
> > > >>>>>>>> I'm curious to hear your thoughts.
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> Alex
> > > >>>>>>>>
> > > >>>>>>>> [1]:
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/blob/3b4b0169032216d85d48130ad43285f86e886e3c/runtime/service/src/main/java/org/apache/polaris/service/events/listeners/PolarisPersistenceEventListener.java%2523L44%26source%3Dgmail-imap%26ust%3D1761841473000000%26usg%3DAOvVaw0v18HDrE4mH9C3INPPhtLp&source=gmail-imap&ust=1761932117000000&usg=AOvVaw1trohcEgBdAL0XvyqjTrmt
> > >
> > >

Reply via email to