Hi all, I am going to assume that we have reached a consensus on this topic and start working on it.
There seems to remain some minor friction around the ID format but I believe we can sort this out by looking at the implementation. If someone disagrees, please chime in. Thanks, Alex On Sun, Oct 26, 2025 at 12:49 PM Alexandre Dutra <[email protected]> wrote: > > Hi all, > > I am on the fence regarding simple class names. We should not tie > Polaris internals to anything that is user-facing, like configuration > or REST API parameters. That's brittle. Someone could rename the > classes one day, and that would break everything. Also, Java cannot > enforce the uniqueness of simple class names, so someone could > introduce a duplicated event class inadvertently. IMO, event type IDs > are much more future-proof, especially because Java **can** enforce > their uniqueness, if we use a single enum for that. > > > Regarding the ID format, I’d lean toward using self-descriptive string IDs > > (e.g., BEFORE_CREATE_CATALOG, AFTER_REFRESH_TABLE, or other formats > > "before-create-catalog", "BeforeCreateCatalog") instead of numeric ones. > > Completely agree. I don't mind using numeric codes in the database – > that's not user-facing and storing a byte or int32 saves space. But in > configuration or REST APIs, it's a different story, we cannot possibly > require users to input numeric codes. I'd note that this is basically > what Polaris does already. Take PolarisPrivilege [1] as an example: > the SQL schema stores numeric codes [2], but the REST API accepts enum > constant names [3]. > > Thanks, > Alex > > [1]: > https://github.com/apache/polaris/blob/41175512655cdb5561b8952a3bf93588185bc536/polaris-core/src/main/java/org/apache/polaris/core/entity/PolarisPrivilege.java > [2]: > https://github.com/apache/polaris/blob/e02bb7171ba7af1aeba224e4d5fd585ea6c88624/persistence/relational-jdbc/src/main/resources/postgres/schema-v3.sql#L87 > [3]: > https://github.com/apache/polaris/blob/cc7bb953cb80e2d52ceb74a06ee8e7b6e644eefa/spec/polaris-management-service.yml#L1558-L1608 > > On Sat, Oct 25, 2025 at 6:13 PM Yufei Gu <[email protected]> wrote: > > > > Class name without path sounds good to me. Spark takes the same route. > > > > Yufei > > > > > > On Sat, Oct 25, 2025 at 1:32 AM Adnan Hemani > > <[email protected]> wrote: > > > > > Hi all, > > > > > > Re schema migration: I’m a +1 with Dmitri’s suggestion to just disable > > > event persistence and throw error logs until the schema is upgraded. > > > > > > Re string IDs: If we wish to just use the simple class name (just the name > > > to the Java class, without the path to the class) - I’m still okay with > > > that, as that was the original thought process and what’s present today. > > > Or > > > if we’d like to use an integer EventTypeId, that makes sense to me too. I > > > just don’t see any value in doing anything in between (i.e. making new > > > string identifiers for each event type, etc.). IMO we will have just end > > > up > > > with the same amount of brittleness that currently exists, now with an > > > additional mapping in between with it. > > > > > > Best, > > > Adnan Hemani > > > > > > > On Oct 24, 2025, at 10:34 AM, Yufei Gu <[email protected]> wrote: > > > > > > > > Alex, thanks for raising the discussion. I believe it's the right > > > direction > > > > to avoid using class names for event types. > > > > > > > > Regarding the ID format, I’d lean toward using self-descriptive string > > > IDs > > > > (e.g., BEFORE_CREATE_CATALOG, AFTER_REFRESH_TABLE, or other formats > > > > "before-create-catalog", "BeforeCreateCatalog") instead of numeric ones. > > > > String IDs are more stable and expressive, making them easier to read in > > > > configurations, logs, and dashboards. They also decouple user-facing > > > > configuration from internal implementation details and avoid the > > > > brittleness of numeric “magic numbers.” > > > > > > > > If we ever need numeric codes for storage or performance reasons, they > > > > could exist as internal representation, but the string ID should remain > > > the > > > > canonical public identifier used in APIs, configs, and event > > > serialization. > > > > > > > > Yufei > > > > > > > > > > > > On Fri, Oct 24, 2025 at 9:04 AM Alexandre Dutra <[email protected] > > > <mailto:[email protected]>> wrote: > > > > > > > >>> I do not think we have technical means to force JDBC schema > > > >> changes on upgrade. > > > >> > > > >> Would a warning under "Upgrade notes" in the changelog be enough? > > > >> > > > >> On Fri, Oct 24, 2025 at 5:53 PM Dmitri Bourlatchkov <[email protected]> > > > >> wrote: > > > >>> > > > >>> Hi All, > > > >>> > > > >>> From my POV we should flag events as "beta" in 1.2.0 release notes and > > > >>> drop/re-create events tables on upgrade. > > > >>> > > > >>> However, I do not think we have technical means to force JDBC schema > > > >>> changes on upgrade. At best we can disable event persistence (with > > > ERROR > > > >>> log messages) if the schema is old and wait for the user to run the > > > admin > > > >>> tool to apply schema changes. > > > >>> > > > >>> Cheers, > > > >>> Dmitri. > > > >>> > > > >>> On Fri, Oct 24, 2025 at 5:26 AM Alexandre Dutra <[email protected]> > > > >> wrote: > > > >>> > > > >>>> Hi all, > > > >>>> > > > >>>> Thank you for your input here. Mike and Adnan: I'm fine with > > > >>>> numerical > > > >>>> ids in the database, let's keep using this pattern. > > > >>>> > > > >>>> I think the only question remaining is how to handle the SQL schema > > > >>>> migration. Currently we have an event_type column of type TEXT NOT > > > >>>> NULL [1]. We would need instead an event_type_code column of type > > > >>>> INTEGER NOT NULL. > > > >>>> > > > >>>> Like Dmitri, I confess I do not recall if the Events API is > > > >>>> considered > > > >>>> beta or production-ready at this point. Also, the events table is > > > >>>> probably meant to be purged regularly. Do you think it would be OK to > > > >>>> just drop the existing column and introduce a new one, also > > > >>>> instructing users to drop and re-create the table when upgrading > > > >>>> Polaris? > > > >>>> > > > >>>> Thanks, > > > >>>> Alex > > > >>>> > > > >>>> [1]: > > > >>>> > > > >> > > > https://www.google.com/url?q=https://github.com/apache/polaris/blob/e02bb7171ba7af1aeba224e4d5fd585ea6c88624/persistence/relational-jdbc/src/main/resources/postgres/schema-v3.sql%23L129&source=gmail-imap&ust=1761932117000000&usg=AOvVaw1oJk-illh5w1s-tuGgiBYg > > > >>>> > > > >>>> On Fri, Oct 24, 2025 at 9:48 AM Adnan Hemani > > > >>>> <[email protected] <mailto: > > > [email protected]>> wrote: > > > >>>>> > > > >>>>> Hi Alex, > > > >>>>> > > > >>>>> I agree with your concerns. Here is what I am recommending: > > > >>>>> * +1 to Michael’s recommendation of keeping numeric IDs for storage > > > >>>> purposes. > > > >>>>> * Waiting on the logical IDs until we have a use case for it. I’m > > > >>>>> not > > > >>>> sure the configurations use case is fully defined yet (if you think > > > >>>> it > > > >> is, > > > >>>> happy to discuss on a different mailing thread so as to not hijack > > > this > > > >>>> thread). But we should add a new method regardless to the > > > >>>> PolarisEvent > > > >>>> interface: `int typeId()` or something similar - and use that forward > > > >> in > > > >>>> storage. > > > >>>>> > > > >>>>> Best, > > > >>>>> Adnan Hemani > > > >>>>> > > > >>>>>> On Oct 23, 2025, at 9:23 AM, Michael Collado < > > > >> [email protected] <mailto:[email protected]>> > > > >>>> wrote: > > > >>>>>> > > > >>>>>> Totally agree. In fact, other enum types (e.g., PolarisPrivilege, > > > >>>>>> PolarisEntityType) use numeric ids in storage to decouple naming in > > > >>>> code > > > >>>>>> from the values we persist in the database. I'd recommend > > > >> continuing > > > >>>> with > > > >>>>>> that pattern. > > > >>>>>> > > > >>>>>> Mike > > > >>>>>> > > > >>>>>> On Wed, Oct 22, 2025 at 10:38 AM Dmitri Bourlatchkov < > > > >> [email protected] <mailto:[email protected]> > > > >>>> <mailto:[email protected]>> > > > >>>>>> wrote: > > > >>>>>> > > > >>>>>>> Hi Alex, > > > >>>>>>> > > > >>>>>>> +1 to all three counts. > > > >>>>>>> > > > >>>>>>> As for the SQL migration script, is event persistence a fully > > > >>>> supported > > > >>>>>>> feature or a "beta"... TBH, I do not recall how it was presented > > > >> to > > > >>>> users. > > > >>>>>>> WDYT? > > > >>>>>>> > > > >>>>>>> Cheers, > > > >>>>>>> Dmitri. > > > >>>>>>> > > > >>>>>>> On Wed, Oct 22, 2025 at 9:44 AM Alexandre Dutra < > > > >> [email protected] <mailto:[email protected]>> > > > >>>> wrote: > > > >>>>>>> > > > >>>>>>>> Hi all, > > > >>>>>>>> > > > >>>>>>>> Currently, Polaris manages over 150 event types, but they lack > > > >> proper > > > >>>>>>>> identifiers, leading to the problematic use of Java class names > > > >> to > > > >>>>>>>> represent the event types. > > > >>>>>>>> > > > >>>>>>>> For instance, PolarisPersistenceEventListener relies on Java > > > >> class > > > >>>>>>>> names to store event types in the database [1]. Thus, renaming or > > > >>>>>>>> moving an event class could inadvertently break this listener, > > > >> or any > > > >>>>>>>> other listener doing something similar. > > > >>>>>>>> > > > >>>>>>>> Moreover, if we were to introduce a configuration to manage event > > > >>>>>>>> types (which I think we'll have to do eventually), we'd need to > > > >> use > > > >>>>>>>> Java class names again, out of better options. Consider a > > > >>>> hypothetical > > > >>>>>>>> configuration example that highlights this issue: > > > >>>>>>>> > > > >>>>>>>> polaris.event-listener.enabled-event-types=\ > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>> > > > >> > > > org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterRefreshTableEvent,\ > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>> > > > >> > > > org.apache.polaris.service.events.IcebergRestCatalogEvents.AfterCommitTableEvent > > > >>>>>>>> > > > >>>>>>>> It's obvious that listing event types in such a way is not only > > > >>>>>>>> verbose, but also cryptic for users. It introduces an unnecessary > > > >>>>>>>> tight coupling between user-facing configuration and Polaris > > > >>>>>>>> internals. > > > >>>>>>>> > > > >>>>>>>> To address these concerns, I would like to promote event types to > > > >>>>>>>> first-class citizens in Polaris: > > > >>>>>>>> > > > >>>>>>>> 1. Introduce the concept of an event type with a unique, logical > > > >> ID > > > >>>>>>>> (e.g., BEFORE_CREATE_CATALOG). This could be enforced through a > > > >> Java > > > >>>>>>>> enum for uniqueness. > > > >>>>>>>> 2. Add a new method, PolarisEventType type(), to the PolarisEvent > > > >>>>>>>> interface. This is not only informative; it could serve e.g. as a > > > >>>> type > > > >>>>>>>> discriminator when serializing events. > > > >>>>>>>> 3. Replace class names with logical IDs, both in configuration > > > >> and > > > >>>>>>>> when storing events in the database. > > > >>>>>>>> > > > >>>>>>>> Note: The last item represents a breaking change and may require > > > >> a > > > >>>> SQL > > > >>>>>>>> migration script. > > > >>>>>>>> > > > >>>>>>>> I'm curious to hear your thoughts. > > > >>>>>>>> > > > >>>>>>>> Thanks, > > > >>>>>>>> Alex > > > >>>>>>>> > > > >>>>>>>> [1]: > > > >>>>>>>> > > > >>>>>>> > > > >>>> > > > >> > > > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/blob/3b4b0169032216d85d48130ad43285f86e886e3c/runtime/service/src/main/java/org/apache/polaris/service/events/listeners/PolarisPersistenceEventListener.java%2523L44%26source%3Dgmail-imap%26ust%3D1761841473000000%26usg%3DAOvVaw0v18HDrE4mH9C3INPPhtLp&source=gmail-imap&ust=1761932117000000&usg=AOvVaw1trohcEgBdAL0XvyqjTrmt > > > > > >
