Hi all,

I'm writing to express my concerns about the current state of the
PolarisEvent API and to propose a solution.

Current challenges:

1) Excessive complexity: the PolarisEvent interface currently has over
150 concrete subtypes, with a corresponding number of methods in the
PolarisEventListener interface. This forces each concrete listener to
implement all 150+ methods, even when the logic is similar or
identical, leading to significant boilerplate (see example [1] from a
recent PR).

2) Manual processes: afaik the current plan for event pruning (e.g.,
removing sensitive or large data) is to implement this event by event.
This has been a slow process so far. We only have 2-3 events
implemented, we still have 147 more to go.

While I generally advocate for strongly typed APIs, I believe that in
this specific context, the PolarisEvent hierarchy is slowing down the
development of event-related features.

Do we need so many subtypes? Events are very short-lived objects; they
are created, immediately passed to a listener, and then
garbage-collected. Besides, most listeners will likely apply the same
logic to all events (basically: serialize and dispatch). This hints at
a type hierarchy that isn't being useful to its main consumers.

My proposal is to completely flatten the PolarisEvent hierarchy.
Instead of numerous concrete types, we would have a single
implementation. This implementation would expose the methods I'm
adding in [2], including type() which allows distinguishing events by
type ID.

It would also expose a new method: Map<String, Object> attributes().

An event factory would be responsible for creating events and
populating these attributes using a common set of well-defined, typed
attribute keys such as "catalog_name", "table_identifier",
"table_metadata", etc.

This creates a schemaless-ish view of the event, which is ideal for
pruning and serialization. It would enable us to apply common rules
more efficiently. For example:

1) All events containing the "table_metdata" attribute could
automatically apply a pruning logic to reduce its size.

2) All events containing a specific attribute could automatically have
sensitive data removed from its value.

I'm curious to hear what the community thinks of this proposal.

Thanks,
Alex

[1]: 
https://github.com/vchag/polaris/blob/4c0aef587e63d5e60d657561a0a53701417f324b/runtime/service/src/main/java/org/apache/polaris/service/events/listeners/AllEventsForwardingListener.java
[2]: https://github.com/apache/polaris/pull/2998

Reply via email to