GitHub user ilikepi63 added a comment to the discussion: "hook" for incoming message
# Introduction I'm going to start out by separating by usecase, of which I see a few: - "Trigger" similar to what you'd find in most SQL databases (for example Postgres' [triggers](https://www.postgresql.org/docs/8.1/triggers.html) ) - A User-defined function (UDF) or User Defined (UDAF) similar to what you'd find in query engines such as Datafusion. - An embedded program - this is where we'd have something that is embeddable into Iggy and invoked as part of it's functionality. # Usecases ## Trigger The primary use case would be @thyseus' initial idea. Some questions to answer: - Could we potentially attach a call out to a separate user defined function based on arbitrary events inside of Iggy? - What kinds of permissions/capabilities would these triggers have? ie run executables, make network calls (if so, at what OSI layer?) - What events would we support? - What will be the calling convention of some of these triggers? - How will errors in these triggers be handled? - Will it be important for these triggers to be sandboxed? ## UDF/UDAF implementation I've always wanted something like this in a streaming service - the use case primarily being compacted topics. For compacted topics in Kafka and Pulsar, only the most recent value for the specified key is retained and the rest are effectively deleted. If you could embed a UDAF function similar to Datafusion's [implementation](https://github.com/apache/datafusion/blob/d20b6d18b901708e48f965385af2119fed01a4c7/datafusion-examples/examples/advanced_udaf.rs#L132), one could potentially have "derivative" topics. This would also likely reduce the need for a lot of stream processing engines which are external and require quite a bit of development. ## Embedded Program This usecase surrounds itself with possibly changing core behaviour inside of Iggy that is difficult to configure with a configuration file format such as yaml. For instance, a developer can implement a script or something that will adjust the way segments get written or possibly set up custom logic for writing partitions to a separate durable store. # Embedded Scripting considerations All of these usecases can be solved with an implementation of some sort of embedded programming language based on events. I will detail some of the idea I have: ## Webassembly Webassembly is really nice for sandboxing specifically. However, this doesn't seem to be a really important feature for Iggy as users will likely trust their own code. Webassembly is also not a target for many higher level languages without shipping the interpreter/runtime with it. Wasm also has a reputation for moving _really slow_ with it's specs. As far as I know, this is what ScyllaDB used for [their UDF implementation](https://opensource.docs.scylladb.com/stable/cql/wasm.html) ## Shared Library This is probably the most straightforward one and allows the most flexibility when it comes to the API. However, there are only a handful of languages that can be compiled to a smallish shared library, so effectively we are limited to some languages that allow that type of interfacing (Rust, C, C++ and Swift come to mind). This is apparently the way Arroyo went with [their implementation of UDF's ](https://www.arroyo.dev/blog/rust-plugin-systems) ## Specific language runtime We could possibly look at embedding a specific language runtime such as RustPython, rusty-v8 or a lua engine. I don't really favour this approach as usually their is a cost to get the data into the VM and get it executed. This also brings a level of overhead as we need to consider things like language version support and all of that. # Events We also need to consider which events might be interesting for developers to be able to "hook" into. This could be: - Compacting a topic - Consuming a message. - Producing a message. - Deleting data based on retention period. - Dropping the to the next tier level of storage. I think that [eBPF](https://ebpf.io/) might give us good insights as they have a really nice framework for 1. Hooking into different events in Linux and 2. compiling a library for use within the kernel. GitHub link: https://github.com/apache/iggy/discussions/1641#discussioncomment-12555509 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
