Hi Grainer, your PR looks very good. Do you have a docker-compose file for Redis? I would like to add it to our CLI [1] in the service directory.
This makes it easy for StreamPipes users to setup an instance and use your new sink. A user just has to add ‘redis’ to the system file and the container is then started with the rest of the system. We already provided docker-compose files for other DBs. Philipp [1] https://github.com/apache/incubator-streampipes-installer/tree/dev/cli <https://github.com/apache/incubator-streampipes-installer/tree/dev/cli> > On 12. May 2020, at 18:09, Grainier Perera <[email protected]> wrote: > > Hi Philipp, > > I agree with your opinion on the key-field. So I've modified it with an > option to either use auto-increment or use an existing event field as the > key field [1]. Now it will have a radio button to select True/False on > auto-increment. And if it's True, key-field will be ignored and a > sequential numeric key will be used. Otherwise, it'll use the selected > field as the key field. > > When it comes to use-cases, a user can; > > 1. Store the last event per asset (asset id as the key-field, > auto-increment disabled, index -1). > 2. Collect all the events for per asset for diagnostics, replaying, > etc... (auto-increment enabled, different index per asset) (index is like a > separate DB with a distinct keyspace, independent from the others [2]) > 3. To collect recent events with data purging. (similar to 1, 2. But, > with an expiration time). > > So, with this new approach, it would allow all the above scenarios. What do > you think? > > [1] https://github.com/apache/incubator-streampipes-extensions/pull/13 > [2] https://www.mikeperham.com/2015/09/24/storing-data-with-redis/ > > Regards, > Grainier. > > On Tue, 12 May 2020 at 12:36, Philipp Zehnder <[email protected]> wrote: > >> Hi Grainer, >> >> the sink looks very cool and I merged your PR. >> >> I have a question regarding the key field. >> >> Currently users can either select ‘-‘ or a ‘runtimeName’ as a >> requiredTextParameter. >> When ‘-‘ is selected a unique counter is used for the key, right? >> The problem is when a user selects a ‘runtimeName’ we can not provide any >> input validation. >> If the primaryKey is not within the event the user will see an error when >> the pipeline is started and has to go back and edit the pipeline. >> >> Alternatively we could use a mapping property for the key field, then the >> user would see a drop down menu of all event properties and could select >> one. >> This way we can ensure that the key is within the event, but then we do >> not have the chance to select ‘-‘. >> >> What do you think is a common use case for the Redit sink? >> Could a use case for redit be to store the last event per asset? (e.g. >> sensor or machine) >> Therefore, we could use the mapping property solution and further extend >> it with a dimension property requirement. >> Then users can select a property representing an identifier (e.g. machine >> id. For each machine an entry would be created in Redit) >> >> >> What do you think? >> >> Philipp >> >> >> >>> On 11. May 2020, at 17:51, Grainier Perera <[email protected]> >> wrote: >>> >>> Hi all, >>> >>> I've sent PR [1] with the initial implementation. Please review and >> merge. >>> >>> [1] https://github.com/apache/incubator-streampipes-extensions/pull/12 >>> >>> Thanks, >>> Grainier. >>> >>> On Mon, 11 May 2020 at 01:20, Dominik Riemer <[email protected]> wrote: >>> >>>> Hi Grainier, >>>> >>>> very cool! A Redis sink would be awesome. >>>> Since I haven't worked a lot with Redis in the past, I don't have a >> strong >>>> opinion, just some thoughts: >>>> I guess the answer depends on the question how users will use events >>>> stored in Redis, whether they will need to access single fields or the >>>> whole event. I'd probably guess that most users will access whole >> events, >>>> which would lead to option 1. >>>> Maybe we could start with 1 and later on add an option in the pipeline >>>> element configuration where users can switch between both options? >>>> >>>> I'll be happy to help you with the SDK in case you have any questions - >> I >>>> know that our documentation has some potential for improvement, so feel >>>> free to ask 😉 >>>> >>>> Dominik >>>> >>>> >>>> -----Original Message----- >>>> From: Grainier Perera <[email protected]> >>>> Sent: Sunday, May 10, 2020 6:20 PM >>>> To: [email protected] >>>> Subject: DataSink for Redis >>>> >>>> Hi all, >>>> >>>> I'm planning to implement a data sink that forwards and store events >> into >>>> Redis[1][2]. But I'd like to get some feedback and opinion from you >> before >>>> proceeding. >>>> >>>> The question that I have is; since Redis is merely a key-value store, >> and >>>> we have a structured event to be persisted, what would the key-value be? >>>> Following are the possible approaches[3]; >>>> >>>> 1. Store the entire object as a JSON-encoded string in a single key. >>>> >>>> * SET event:{id} '{"sensorId":"001", "temp":28}'* >>>> >>>> >>>> - Pro: faster when accessing all the fields of the event at once. >>>> - Pro: works with nested objects (but I don't think we have any nested >>>> objects). >>>> - Pro: can set the TTL. >>>> - Con: slower when accessing a single or subset of fields of the >> event. >>>> - Con: JSON parsing is required to retrieve fields. However, it's >> quite >>>> fast. >>>> >>>> >>>> 2. Store each Object's properties in a Redis hash. >>>> >>>> * HMSET event:{id} sensorId "001"* >>>> >>>> * HMSET event:{id} temp "28"* >>>> >>>> >>>> - Pro: can set the TTL. >>>> - Pro: no need to parse JSON strings. >>>> - Con: faster when accessing a single or subset of fields of the >> event. >>>> - Con: slower when accessing all the fields of the event. >>>> >>>> >>>> 3. Store each Object as a JSON string in a Redis hash. >>>> >>>> * HMSET events {id1} '{"sensorId":"001", "temp":28}'* >>>> >>>> * HMSET events {id2} '{"sensorId":"002", "temp":32}'* >>>> >>>> >>>> - Pro: fewer keys to work with. >>>> - Con: can't set the TTL. >>>> - Con: JSON parsing is required to retrieve fields. >>>> - Con: slower when accessing a single or subset of fields of the >> event. >>>> >>>> >>>> 4. Store each property of each Object in a dedicated key. >>>> >>>> * SET event:{id}:sensorId "001"* >>>> >>>> * SET event:{id}:temp 28* >>>> >>>> >>>> - Pro: can set the TTL per field (but it's not necessary for our >>>> scenario). >>>> - Pro: no need to parse JSON strings. >>>> - Con: faster when accessing a single or subset of fields of the >> event. >>>> - Con: slower when accessing all the fields of the event. >>>> >>>> >>>> 5. Use RedisJSON[4][5] module and store each event as a JSON. >>>> >>>> * JSON.SET event . '{"sensorId":"001", "temp":28}'* >>>> >>>> >>>> - Pro: faster manipulation of JSON documents. >>>> - Pro: faster when accessing single/multiple fields of the event. >>>> - Pro: can set the TTL. >>>> - Con: requires RedisJSON module. >>>> >>>> >>>> IMO, 1 & 2 would be the best choices given that they both allow (TTL) >> for >>>> purging. What would you think is best? Your feedback is highly >> appreciated. >>>> >>>> [1] https://redis.io/ >>>> [2] https://issues.apache.org/jira/browse/STREAMPIPES-121 >>>> <https://redis.io/> >>>> [3] >>>> >>>> >> https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency >>>> [4] https://redislabs.com/redis-enterprise/redis-json/ >>>> [5] https://oss.redislabs.com/redisjson/ >>>> >>>> Regards, >>>> Grainier. >>>> >>>> >> >> >>
