Re: DataSink for Redis

Philipp Zehnder Tue, 12 May 2020 13:31:24 -0700

Hi Grainer,

your PR looks very good.
Do you have a docker-compose file for Redis?
I would like to add it to our CLI [1] in the service directory.


This makes it easy for StreamPipes users to setup an instance and use your new 
sink.
A user just has to add ‘redis’ to the system file and the container is then 
started with the rest of the system.
We already provided docker-compose files for other DBs.

Philipp

[1] https://github.com/apache/incubator-streampipes-installer/tree/dev/cli 
<https://github.com/apache/incubator-streampipes-installer/tree/dev/cli>

> On 12. May 2020, at 18:09, Grainier Perera <[email protected]> wrote:
> 
> Hi Philipp,
> 
> I agree with your opinion on the key-field. So I've modified it with an
> option to either use auto-increment or use an existing event field as the
> key field [1]. Now it will have a radio button to select True/False on
> auto-increment. And if it's True, key-field will be ignored and a
> sequential numeric key will be used. Otherwise, it'll use the selected
> field as the key field.
> 
> When it comes to use-cases, a user can;
> 
>   1. Store the last event per asset (asset id as the key-field,
>   auto-increment disabled, index -1).
>   2. Collect all the events for per asset for diagnostics, replaying,
>   etc... (auto-increment enabled, different index per asset) (index is like a
>   separate DB with a distinct keyspace, independent from the others [2])
>   3. To collect recent events with data purging. (similar to 1, 2. But,
>   with an expiration time).
> 
> So, with this new approach, it would allow all the above scenarios. What do
> you think?
> 
> [1] https://github.com/apache/incubator-streampipes-extensions/pull/13
> [2] https://www.mikeperham.com/2015/09/24/storing-data-with-redis/
> 
> Regards,
> Grainier.
> 
> On Tue, 12 May 2020 at 12:36, Philipp Zehnder <[email protected]> wrote:
> 
>> Hi Grainer,
>> 
>> the sink looks very cool and I merged your PR.
>> 
>> I have a question regarding the key field.
>> 
>> Currently users can either select ‘-‘ or a ‘runtimeName’ as a
>> requiredTextParameter.
>> When ‘-‘ is selected a unique counter is used for the key, right?
>> The problem is when a user selects a ‘runtimeName’ we can not provide any
>> input validation.
>> If the primaryKey is not within the event the user will see an error when
>> the pipeline is started and has to go back and edit the pipeline.
>> 
>> Alternatively we could use a mapping property for the key field, then the
>> user would see a drop down menu of all event properties and could select
>> one.
>> This way we can ensure that the key is within the event, but then we do
>> not have the chance to select ‘-‘.
>> 
>> What do you think is a common use case for the Redit sink?
>> Could a use case for redit be to store the last event per asset? (e.g.
>> sensor or machine)
>> Therefore, we could use the mapping property solution and further extend
>> it with a dimension property requirement.
>> Then users can select a property representing an identifier (e.g. machine
>> id. For each machine an entry would be created in Redit)
>> 
>> 
>> What do you think?
>> 
>> Philipp
>> 
>> 
>> 
>>> On 11. May 2020, at 17:51, Grainier Perera <[email protected]>
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I've sent PR [1] with the initial implementation. Please review and
>> merge.
>>> 
>>> [1] https://github.com/apache/incubator-streampipes-extensions/pull/12
>>> 
>>> Thanks,
>>> Grainier.
>>> 
>>> On Mon, 11 May 2020 at 01:20, Dominik Riemer <[email protected]> wrote:
>>> 
>>>> Hi Grainier,
>>>> 
>>>> very cool! A Redis sink would be awesome.
>>>> Since I haven't worked a lot with Redis in the past, I don't have a
>> strong
>>>> opinion, just some thoughts:
>>>> I guess the answer depends on the question how users will use events
>>>> stored in Redis, whether they will need to access single fields or the
>>>> whole event. I'd probably guess that most users will access whole
>> events,
>>>> which would lead to option 1.
>>>> Maybe we could start with 1 and later on add an option in the pipeline
>>>> element configuration where users can switch between both options?
>>>> 
>>>> I'll be happy to help you with the SDK in case you have any questions -
>> I
>>>> know that our documentation has some potential for improvement, so feel
>>>> free to ask 😉
>>>> 
>>>> Dominik
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Grainier Perera <[email protected]>
>>>> Sent: Sunday, May 10, 2020 6:20 PM
>>>> To: [email protected]
>>>> Subject: DataSink for Redis
>>>> 
>>>> Hi all,
>>>> 
>>>> I'm planning to implement a data sink that forwards and store events
>> into
>>>> Redis[1][2]. But I'd like to get some feedback and opinion from you
>> before
>>>> proceeding.
>>>> 
>>>> The question that I have is; since Redis is merely a key-value store,
>> and
>>>> we have a structured event to be persisted, what would the key-value be?
>>>> Following are the possible approaches[3];
>>>> 
>>>> 1. Store the entire object as a JSON-encoded string in a single key.
>>>> 
>>>> * SET event:{id} '{"sensorId":"001", "temp":28}'*
>>>> 
>>>> 
>>>>  - Pro: faster when accessing all the fields of the event at once.
>>>>  - Pro: works with nested objects (but I don't think we have any nested
>>>>  objects).
>>>>  - Pro: can set the TTL.
>>>>  - Con: slower when accessing a single or subset of fields of the
>> event.
>>>>  - Con: JSON parsing is required to retrieve fields. However, it's
>> quite
>>>>  fast.
>>>> 
>>>> 
>>>> 2. Store each Object's properties in a Redis hash.
>>>> 
>>>> * HMSET event:{id} sensorId "001"*
>>>> 
>>>> * HMSET event:{id} temp "28"*
>>>> 
>>>> 
>>>>  - Pro: can set the TTL.
>>>>  - Pro: no need to parse JSON strings.
>>>>  - Con: faster when accessing a single or subset of fields of the
>> event.
>>>>  - Con: slower when accessing all the fields of the event.
>>>> 
>>>> 
>>>> 3. Store each Object as a JSON string in a Redis hash.
>>>> 
>>>> * HMSET events {id1} '{"sensorId":"001", "temp":28}'*
>>>> 
>>>> * HMSET events {id2} '{"sensorId":"002", "temp":32}'*
>>>> 
>>>> 
>>>>  - Pro: fewer keys to work with.
>>>>  - Con: can't set the TTL.
>>>>  - Con: JSON parsing is required to retrieve fields.
>>>>  - Con: slower when accessing a single or subset of fields of the
>> event.
>>>> 
>>>> 
>>>> 4. Store each property of each Object in a dedicated key.
>>>> 
>>>> * SET event:{id}:sensorId "001"*
>>>> 
>>>> * SET event:{id}:temp 28*
>>>> 
>>>> 
>>>>  - Pro: can set the TTL per field (but it's not necessary for our
>>>>  scenario).
>>>>  - Pro: no need to parse JSON strings.
>>>>  - Con: faster when accessing a single or subset of fields of the
>> event.
>>>>  - Con: slower when accessing all the fields of the event.
>>>> 
>>>> 
>>>> 5. Use RedisJSON[4][5] module and store each event as a JSON.
>>>> 
>>>> * JSON.SET event . '{"sensorId":"001", "temp":28}'*
>>>> 
>>>> 
>>>>  - Pro: faster manipulation of JSON documents.
>>>>  - Pro: faster when accessing single/multiple fields of the event.
>>>>  - Pro: can set the TTL.
>>>>  - Con: requires RedisJSON module.
>>>> 
>>>> 
>>>> IMO, 1 & 2 would be the best choices given that they both allow (TTL)
>> for
>>>> purging. What would you think is best? Your feedback is highly
>> appreciated.
>>>> 
>>>> [1] https://redis.io/
>>>> [2] https://issues.apache.org/jira/browse/STREAMPIPES-121
>>>> <https://redis.io/>
>>>> [3]
>>>> 
>>>> 
>> https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
>>>> [4] https://redislabs.com/redis-enterprise/redis-json/
>>>> [5] https://oss.redislabs.com/redisjson/
>>>> 
>>>> Regards,
>>>> Grainier.
>>>> 
>>>> 
>> 
>> 
>>

Re: DataSink for Redis

Reply via email to