Hi Grainier,

very cool! A Redis sink would be awesome.
Since I haven't worked a lot with Redis in the past, I don't have a strong 
opinion, just some thoughts:
I guess the answer depends on the question how users will use events stored in 
Redis, whether they will need to access single fields or the whole event. I'd 
probably guess that most users will access whole events, which would lead to 
option 1.
Maybe we could start with 1 and later on add an option in the pipeline element 
configuration where users can switch between both options? 

I'll be happy to help you with the SDK in case you have any questions - I know 
that our documentation has some potential for improvement, so feel free to ask 😉

Dominik


-----Original Message-----
From: Grainier Perera <[email protected]> 
Sent: Sunday, May 10, 2020 6:20 PM
To: [email protected]
Subject: DataSink for Redis

Hi all,

I'm planning to implement a data sink that forwards and store events into 
Redis[1][2]. But I'd like to get some feedback and opinion from you before 
proceeding.

The question that I have is; since Redis is merely a key-value store, and we 
have a structured event to be persisted, what would the key-value be?
Following are the possible approaches[3];

1. Store the entire object as a JSON-encoded string in a single key.

* SET event:{id} '{"sensorId":"001", "temp":28}'*


   - Pro: faster when accessing all the fields of the event at once.
   - Pro: works with nested objects (but I don't think we have any nested
   objects).
   - Pro: can set the TTL.
   - Con: slower when accessing a single or subset of fields of the event.
   - Con: JSON parsing is required to retrieve fields. However, it's quite
   fast.


2. Store each Object's properties in a Redis hash.

* HMSET event:{id} sensorId "001"*

* HMSET event:{id} temp "28"*


   - Pro: can set the TTL.
   - Pro: no need to parse JSON strings.
   - Con: faster when accessing a single or subset of fields of the event.
   - Con: slower when accessing all the fields of the event.


3. Store each Object as a JSON string in a Redis hash.

* HMSET events {id1} '{"sensorId":"001", "temp":28}'*

* HMSET events {id2} '{"sensorId":"002", "temp":32}'*


   - Pro: fewer keys to work with.
   - Con: can't set the TTL.
   - Con: JSON parsing is required to retrieve fields.
   - Con: slower when accessing a single or subset of fields of the event.


4. Store each property of each Object in a dedicated key.

* SET event:{id}:sensorId "001"*

* SET event:{id}:temp 28*


   - Pro: can set the TTL per field (but it's not necessary for our
   scenario).
   - Pro: no need to parse JSON strings.
   - Con: faster when accessing a single or subset of fields of the event.
   - Con: slower when accessing all the fields of the event.


5. Use RedisJSON[4][5] module and store each event as a JSON.

* JSON.SET event . '{"sensorId":"001", "temp":28}'*


   - Pro: faster manipulation of JSON documents.
   - Pro: faster when accessing single/multiple fields of the event.
   - Pro: can set the TTL.
   - Con: requires RedisJSON module.


IMO, 1 & 2 would be the best choices given that they both allow (TTL) for 
purging. What would you think is best? Your feedback is highly appreciated.

[1] https://redis.io/
[2] https://issues.apache.org/jira/browse/STREAMPIPES-121
<https://redis.io/>
[3]
https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
[4] https://redislabs.com/redis-enterprise/redis-json/
[5] https://oss.redislabs.com/redisjson/

Regards,
Grainier.

Reply via email to