Hi Grainier, very cool! A Redis sink would be awesome. Since I haven't worked a lot with Redis in the past, I don't have a strong opinion, just some thoughts: I guess the answer depends on the question how users will use events stored in Redis, whether they will need to access single fields or the whole event. I'd probably guess that most users will access whole events, which would lead to option 1. Maybe we could start with 1 and later on add an option in the pipeline element configuration where users can switch between both options?
I'll be happy to help you with the SDK in case you have any questions - I know that our documentation has some potential for improvement, so feel free to ask 😉 Dominik -----Original Message----- From: Grainier Perera <[email protected]> Sent: Sunday, May 10, 2020 6:20 PM To: [email protected] Subject: DataSink for Redis Hi all, I'm planning to implement a data sink that forwards and store events into Redis[1][2]. But I'd like to get some feedback and opinion from you before proceeding. The question that I have is; since Redis is merely a key-value store, and we have a structured event to be persisted, what would the key-value be? Following are the possible approaches[3]; 1. Store the entire object as a JSON-encoded string in a single key. * SET event:{id} '{"sensorId":"001", "temp":28}'* - Pro: faster when accessing all the fields of the event at once. - Pro: works with nested objects (but I don't think we have any nested objects). - Pro: can set the TTL. - Con: slower when accessing a single or subset of fields of the event. - Con: JSON parsing is required to retrieve fields. However, it's quite fast. 2. Store each Object's properties in a Redis hash. * HMSET event:{id} sensorId "001"* * HMSET event:{id} temp "28"* - Pro: can set the TTL. - Pro: no need to parse JSON strings. - Con: faster when accessing a single or subset of fields of the event. - Con: slower when accessing all the fields of the event. 3. Store each Object as a JSON string in a Redis hash. * HMSET events {id1} '{"sensorId":"001", "temp":28}'* * HMSET events {id2} '{"sensorId":"002", "temp":32}'* - Pro: fewer keys to work with. - Con: can't set the TTL. - Con: JSON parsing is required to retrieve fields. - Con: slower when accessing a single or subset of fields of the event. 4. Store each property of each Object in a dedicated key. * SET event:{id}:sensorId "001"* * SET event:{id}:temp 28* - Pro: can set the TTL per field (but it's not necessary for our scenario). - Pro: no need to parse JSON strings. - Con: faster when accessing a single or subset of fields of the event. - Con: slower when accessing all the fields of the event. 5. Use RedisJSON[4][5] module and store each event as a JSON. * JSON.SET event . '{"sensorId":"001", "temp":28}'* - Pro: faster manipulation of JSON documents. - Pro: faster when accessing single/multiple fields of the event. - Pro: can set the TTL. - Con: requires RedisJSON module. IMO, 1 & 2 would be the best choices given that they both allow (TTL) for purging. What would you think is best? Your feedback is highly appreciated. [1] https://redis.io/ [2] https://issues.apache.org/jira/browse/STREAMPIPES-121 <https://redis.io/> [3] https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency [4] https://redislabs.com/redis-enterprise/redis-json/ [5] https://oss.redislabs.com/redisjson/ Regards, Grainier.
