I think this is a great idea, Jack. Thank you for bringing this up! +1 There have been several people interested in having more observability (for example for table design patterns akin to how folks might monitor Hive) and events would be a big win for that and something users could use with a lot of their existing infra (Kafka, REST services, AWS or other cloud provider queue types).
Spark has an existing interface, ExternalCatalogWithListener, which emits events we might hook into. I won't go into too much detail here. And while these Spark "ExternalCatalogEvents" shouldn't be how we define our own events, which should have their own type system, it could be a beneficial source of event hooks from within Spark. It also provides us table level query data we don't currently otherwise get. It's worth investigating if we haven't, though we might choose to forgo it's complexity. I agree conceptually that most events should be registered at the table level, though I'd be open to having events of differing granularities. Especially if this helps support cross-table patterns. But table level data should be prioritized first. If you have something to share or would like to make time to discuss, please count me in. This is an area I've been thinking about a bit lately as I've had quite some interest in observability and possible event-driven patterns. Best Kyle (GitHub @kbendick) On Tue, Nov 30, 2021 at 9:50 PM Neelesh Salian <neeleshssal...@gmail.com> wrote: > +1 to this effort. > There is value in adding support for Events - general bookkeeping and > helping replay actions in the event of recovery. > At the minimum we should aim to track the following all catalogs: > 1. Create actions > 2. Alter actions > 3. Delete actions > across all tables, properties and namespaces. > > > > On Tue, Nov 30, 2021 at 9:12 PM Jack Ye <yezhao...@gmail.com> wrote: > >> Hi everyone, >> >> I would like to start some initial discussions around Iceberg event >> notification support, because we might have some engineering resources to >> work on Iceberg notification integration with AWS services such as SNS, >> SQS, CloudWatch. >> >> As of today, we have a Listener interface and 3 events ScanEvent, >> IncrementalScanEvent, CreateSnapshotEvent. There is a static registry >> called Listeners that registers the event listeners in the JVM. >> >> However, when I read the related code paths, my thought is that it might >> be better to register listeners per-table, based on the following >> observations: >> 1. Iceberg events are all table or sub-table level events. For any >> catalog or global level events, the catalog service can provide >> notifications, Iceberg can be out of the picture. >> 2. A user might have multiple Iceberg catalogs defined, pointing to >> different catalog services. (e.g. one to AWS Glue, one to a Hive >> metastore). The notifications from tables of these different catalogs >> should be directed to different listeners at least per catalog, instead of >> the same set of listeners that are registered globally. >> 3. Event listener configurations are usually static. It makes more sense >> to me to define it once and then repeatedly use it, instead of >> re-registering it every time I start an application. >> >> If we register the listeners at table level, we can add a hook in >> TableOperations to get a set of listeners to emit specific events. The >> listeners could be defined and serialized as a part of the table >> properties, or maybe even a part of the Iceberg spec. >> >> This is really just my brainstorming. Maybe it's a bit overkill, maybe I >> am missing the correct way to use the Listeners static registry. It would >> be great if anyone could provide more contexts or thoughts around this >> topic. >> >> Best, >> Jack Ye >> >> >> >> >> >> >> >> >> >> > > -- > Regards, > Neelesh S. Salian > >