I think this is a great idea, Jack. Thank you for bringing this up! +1

There have been several people interested in having more observability (for
example for table design patterns akin to how folks might monitor Hive) and
events would be a big win for that and something users could use with a lot
of their existing infra (Kafka, REST services, AWS or other cloud provider
queue types).

Spark has an existing interface, ExternalCatalogWithListener, which emits
events we might hook into. I won't go into too much detail here. And while
these Spark "ExternalCatalogEvents" shouldn't be how we define our own
events, which should have their own type system, it could be a beneficial
source of event hooks from within Spark. It also provides us table level
query data we don't currently otherwise get. It's worth investigating if we
haven't, though we might choose to forgo it's complexity.

I agree conceptually that most events should be registered at the table
level, though I'd be open to having events of differing granularities.
Especially if this helps support cross-table patterns. But table level data
should be prioritized first.

If you have something to share or would like to make time to discuss,
please count me in. This is an area I've been thinking about a bit lately
as I've had quite some interest in observability and possible event-driven
patterns.

Best
Kyle (GitHub @kbendick)

On Tue, Nov 30, 2021 at 9:50 PM Neelesh Salian <neeleshssal...@gmail.com>
wrote:

> +1 to this effort.
> There is value in adding support for Events - general bookkeeping and
> helping replay actions in the event of recovery.
> At the minimum we should aim to track the following all catalogs:
> 1. Create actions
> 2. Alter actions
> 3. Delete actions
> across all tables, properties and namespaces.
>
>
>
> On Tue, Nov 30, 2021 at 9:12 PM Jack Ye <yezhao...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I would like to start some initial discussions around Iceberg event
>> notification support, because we might have some engineering resources to
>> work on Iceberg notification integration with AWS services such as SNS,
>> SQS, CloudWatch.
>>
>> As of today, we have a Listener interface and 3 events ScanEvent,
>> IncrementalScanEvent, CreateSnapshotEvent. There is a static registry
>> called Listeners that registers the event listeners in the JVM.
>>
>> However, when I read the related code paths, my thought is that it might
>> be better to register listeners per-table, based on the following
>> observations:
>> 1. Iceberg events are all table or sub-table level events. For any
>> catalog or global level events, the catalog service can provide
>> notifications, Iceberg can be out of the picture.
>> 2. A user might have multiple Iceberg catalogs defined, pointing to
>> different catalog services. (e.g. one to AWS Glue, one to a Hive
>> metastore). The notifications from tables of these different catalogs
>> should be directed to different listeners at least per catalog, instead of
>> the same set of listeners that are registered globally.
>> 3. Event listener configurations are usually static. It makes more sense
>> to me to define it once and then repeatedly use it, instead of
>> re-registering it every time I start an application.
>>
>> If we register the listeners at table level, we can add a hook in
>> TableOperations to get a set of listeners to emit specific events. The
>> listeners could be defined and serialized as a part of the table
>> properties, or maybe even a part of the Iceberg spec.
>>
>> This is really just my brainstorming. Maybe it's a bit overkill, maybe I
>> am missing the correct way to use the Listeners static registry. It would
>> be great if anyone could provide more contexts or thoughts around this
>> topic.
>>
>> Best,
>> Jack Ye
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> --
> Regards,
> Neelesh S. Salian
>
>

Reply via email to