GitHub user ilikepi63 added a comment to the discussion: "hook" for incoming 
message

# Introduction

I'm going to start out by separating by usecase, of which I see a few: 

- "Trigger" similar to what you'd find in most SQL databases (for example 
Postgres' [triggers](https://www.postgresql.org/docs/8.1/triggers.html) )
- A User-defined function (UDF) or User Defined (UDAF) similar to what you'd 
find in query engines such as Datafusion.
- An embedded program - this is where we'd have something that is embeddable 
into Iggy and invoked as part of it's functionality. 

# Usecases

## Trigger

The primary use case would be @thyseus' initial idea. Some questions to answer: 
- Could we potentially attach a call out to a separate user defined function 
based on arbitrary events inside of Iggy? 
- What kinds of permissions/capabilities would these triggers have? ie run 
executables, make network calls (if so, at what OSI layer?)
- What events would we support? 
- What will be the calling convention of some of these triggers? 
- How will errors in these triggers be handled? 
- Will it be important for these triggers to be sandboxed? 

## UDF/UDAF implementation 

I've always wanted something like this in a streaming service - the use case 
primarily being compacted topics. For compacted topics in Kafka and Pulsar, 
only the most recent value for the specified key is retained and the rest are 
effectively deleted. If you could embed a UDAF function similar to Datafusion's 
[implementation](https://github.com/apache/datafusion/blob/d20b6d18b901708e48f965385af2119fed01a4c7/datafusion-examples/examples/advanced_udaf.rs#L132),
 one could potentially have "derivative" topics. This would also likely reduce 
the need for a lot of stream processing engines which are external and require 
quite a bit of development. 

## Embedded Program

This usecase surrounds itself with possibly changing core behaviour inside of 
Iggy that is difficult to configure with a configuration file format such as 
yaml. For instance, a developer can implement a script or something that will 
adjust the way segments get written or possibly set up custom logic for writing 
partitions to a separate durable store. 

# Embedded Scripting considerations

All of these usecases can be solved with an implementation of some sort of 
embedded programming language based on events. I will detail some of the idea I 
have:

## Webassembly

Webassembly is really nice for sandboxing specifically. However, this doesn't 
seem to be a really important feature for Iggy as users will likely trust their 
own code. Webassembly is also not a target for many higher level languages 
without shipping the interpreter/runtime with it. Wasm also has a reputation 
for moving _really slow_ with it's specs. As far as I know, this is what 
ScyllaDB used for [their UDF 
implementation](https://opensource.docs.scylladb.com/stable/cql/wasm.html) 

## Shared Library

This is probably the most straightforward one and allows the most flexibility 
when it comes to the API. However, there are only a handful of languages that 
can be compiled to a smallish shared library, so effectively we are limited to 
some languages that allow that type of interfacing (Rust, C, C++ and Swift come 
to mind). This is apparently the way Arroyo went with [their implementation of 
UDF's ](https://www.arroyo.dev/blog/rust-plugin-systems)

## Specific language runtime

We could possibly look at embedding a specific language runtime such as 
RustPython, rusty-v8 or a lua engine. I don't really favour this approach as 
usually their is a cost to get the data into the VM and get it executed. This 
also brings a level of overhead as we need to consider things like language 
version support and all of that. 

# Events

We also need to consider which events might be interesting for developers to be 
able to "hook" into. This could be: 

-  Compacting a topic
- Consuming a message. 
- Producing a message. 
- Deleting data based on retention period. 
- Dropping the to the next tier level of storage. 

I think that [eBPF](https://ebpf.io/) might give us good insights as they have 
a really nice framework for 

1. Hooking into different events in Linux and
2. compiling a library for use within the kernel. 





GitHub link: 
https://github.com/apache/iggy/discussions/1641#discussioncomment-12555509

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to