Re: DataSink for Redis

Patrick Wiener Thu, 14 May 2020 07:05:26 -0700

Hi Grainier,

thx again for the work. I reviewed the redis manifests for the optional service 
and it looks fine.


Thus, I merged it to dev.

Patrick

> Am 14.05.2020 um 15:50 schrieb Grainier Perera <[email protected]>:
> 
> Hi Patrick,
> 
> I've sent a PR[1] with the changes for Kubernetes deployment. Can you
> please review and merge?
> 
> [1] https://github.com/apache/incubator-streampipes-installer/pull/7 
> <https://github.com/apache/incubator-streampipes-installer/pull/7>
> 
> Thanks,
> Grainier.
> 
> On Wed, 13 May 2020 at 13:26, Patrick Wiener <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Hi,
>> 
>> for Kubernetes deployment, you can have a look at how we did it for IoTDB
>> [1]
>> 
>> You would need to declare three files:
>> 
>> - deployment
>> - service
>> - persistent volume claim
>> 
>> Please make sure to add the following template as this will be parsed by
>> helm, so that we have optional external services only started as part of
>> StreamPipes full version.
>> 
>> {{- if eq .Values.deployment "full" }}
>> ...
>> {{- end }}
>> 
>> [1]
>> https://github.com/apache/incubator-streampipes-installer/tree/dev/helm-chart/templates/optional-external-services/iotdb
>> <
>> https://github.com/apache/incubator-streampipes-installer/tree/dev/helm-chart/templates/optional-external-services/iotdb
>>  
>> <https://github.com/apache/incubator-streampipes-installer/tree/dev/helm-chart/templates/optional-external-services/iotdb>
>>> 
>> 
>> Happy to help,
>> Patrick
>> 
>> 
>>> Am 13.05.2020 um 06:33 schrieb Philipp Zehnder <[email protected]>:
>>> 
>>> Hi Grainer,
>>> 
>>> thank you! I direclty merged the pull request with the docker-compose
>> file.
>>> 
>>> @Patrick, what else do we have to add when we want to use Redit in
>> Kubernetes?
>>> Do we also have to add a template in [1] as well or is it sufficient to
>> have the docker-compose file?
>>> 
>>> Philipp
>>> 
>>> [1]
>> https://github.com/apache/incubator-streampipes-installer/tree/dev/helm-chart/templates/optional-external-services
>> <
>> https://github.com/apache/incubator-streampipes-installer/tree/dev/helm-chart/templates/optional-external-services
>>> 
>>> 
>>> On 2020/05/13 03:01:37, Grainier Perera <[email protected]>
>> wrote:
>>>> Hi Philipp,
>>>> 
>>>> I've created an issue [1] and added a docker-compose file for Redis in
>>>> PR[2]. Please review and merge.
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/STREAMPIPES-124
>>>> [2] https://github.com/apache/incubator-streampipes-installer/pull/6
>>>> 
>>>> Thanks,
>>>> Grainier.
>>>> 
>>>> On Wed, 13 May 2020 at 02:01, Philipp Zehnder <[email protected]>
>> wrote:
>>>> 
>>>>> Hi Grainer,
>>>>> 
>>>>> your PR looks very good.
>>>>> Do you have a docker-compose file for Redis?
>>>>> I would like to add it to our CLI [1] in the service directory.
>>>>> 
>>>>> This makes it easy for StreamPipes users to setup an instance and use
>> your
>>>>> new sink.
>>>>> A user just has to add ‘redis’ to the system file and the container is
>>>>> then started with the rest of the system.
>>>>> We already provided docker-compose files for other DBs.
>>>>> 
>>>>> Philipp
>>>>> 
>>>>> [1]
>> https://github.com/apache/incubator-streampipes-installer/tree/dev/cli
>>>>> <
>> https://github.com/apache/incubator-streampipes-installer/tree/dev/cli>
>>>>> 
>>>>>> On 12. May 2020, at 18:09, Grainier Perera <[email protected]
>>> 
>>>>> wrote:
>>>>>> 
>>>>>> Hi Philipp,
>>>>>> 
>>>>>> I agree with your opinion on the key-field. So I've modified it with
>> an
>>>>>> option to either use auto-increment or use an existing event field as
>> the
>>>>>> key field [1]. Now it will have a radio button to select True/False on
>>>>>> auto-increment. And if it's True, key-field will be ignored and a
>>>>>> sequential numeric key will be used. Otherwise, it'll use the selected
>>>>>> field as the key field.
>>>>>> 
>>>>>> When it comes to use-cases, a user can;
>>>>>> 
>>>>>> 1. Store the last event per asset (asset id as the key-field,
>>>>>> auto-increment disabled, index -1).
>>>>>> 2. Collect all the events for per asset for diagnostics, replaying,
>>>>>> etc... (auto-increment enabled, different index per asset) (index is
>>>>> like a
>>>>>> separate DB with a distinct keyspace, independent from the others
>> [2])
>>>>>> 3. To collect recent events with data purging. (similar to 1, 2. But,
>>>>>> with an expiration time).
>>>>>> 
>>>>>> So, with this new approach, it would allow all the above scenarios.
>> What
>>>>> do
>>>>>> you think?
>>>>>> 
>>>>>> [1]
>> https://github.com/apache/incubator-streampipes-extensions/pull/13
>>>>>> [2] https://www.mikeperham.com/2015/09/24/storing-data-with-redis/
>>>>>> 
>>>>>> Regards,
>>>>>> Grainier.
>>>>>> 
>>>>>> On Tue, 12 May 2020 at 12:36, Philipp Zehnder <[email protected]>
>>>>> wrote:
>>>>>> 
>>>>>>> Hi Grainer,
>>>>>>> 
>>>>>>> the sink looks very cool and I merged your PR.
>>>>>>> 
>>>>>>> I have a question regarding the key field.
>>>>>>> 
>>>>>>> Currently users can either select ‘-‘ or a ‘runtimeName’ as a
>>>>>>> requiredTextParameter.
>>>>>>> When ‘-‘ is selected a unique counter is used for the key, right?
>>>>>>> The problem is when a user selects a ‘runtimeName’ we can not provide
>>>>> any
>>>>>>> input validation.
>>>>>>> If the primaryKey is not within the event the user will see an error
>>>>> when
>>>>>>> the pipeline is started and has to go back and edit the pipeline.
>>>>>>> 
>>>>>>> Alternatively we could use a mapping property for the key field, then
>>>>> the
>>>>>>> user would see a drop down menu of all event properties and could
>> select
>>>>>>> one.
>>>>>>> This way we can ensure that the key is within the event, but then we
>> do
>>>>>>> not have the chance to select ‘-‘.
>>>>>>> 
>>>>>>> What do you think is a common use case for the Redit sink?
>>>>>>> Could a use case for redit be to store the last event per asset?
>> (e.g.
>>>>>>> sensor or machine)
>>>>>>> Therefore, we could use the mapping property solution and further
>> extend
>>>>>>> it with a dimension property requirement.
>>>>>>> Then users can select a property representing an identifier (e.g.
>>>>> machine
>>>>>>> id. For each machine an entry would be created in Redit)
>>>>>>> 
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> Philipp
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 11. May 2020, at 17:51, Grainier Perera <
>> [email protected]>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I've sent PR [1] with the initial implementation. Please review and
>>>>>>> merge.
>>>>>>>> 
>>>>>>>> [1]
>> https://github.com/apache/incubator-streampipes-extensions/pull/12
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Grainier.
>>>>>>>> 
>>>>>>>> On Mon, 11 May 2020 at 01:20, Dominik Riemer <[email protected]>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Grainier,
>>>>>>>>> 
>>>>>>>>> very cool! A Redis sink would be awesome.
>>>>>>>>> Since I haven't worked a lot with Redis in the past, I don't have a
>>>>>>> strong
>>>>>>>>> opinion, just some thoughts:
>>>>>>>>> I guess the answer depends on the question how users will use
>> events
>>>>>>>>> stored in Redis, whether they will need to access single fields or
>> the
>>>>>>>>> whole event. I'd probably guess that most users will access whole
>>>>>>> events,
>>>>>>>>> which would lead to option 1.
>>>>>>>>> Maybe we could start with 1 and later on add an option in the
>> pipeline
>>>>>>>>> element configuration where users can switch between both options?
>>>>>>>>> 
>>>>>>>>> I'll be happy to help you with the SDK in case you have any
>> questions
>>>>> -
>>>>>>> I
>>>>>>>>> know that our documentation has some potential for improvement, so
>>>>> feel
>>>>>>>>> free to ask 😉
>>>>>>>>> 
>>>>>>>>> Dominik
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Grainier Perera <[email protected]>
>>>>>>>>> Sent: Sunday, May 10, 2020 6:20 PM
>>>>>>>>> To: [email protected]
>>>>>>>>> Subject: DataSink for Redis
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I'm planning to implement a data sink that forwards and store
>> events
>>>>>>> into
>>>>>>>>> Redis[1][2]. But I'd like to get some feedback and opinion from you
>>>>>>> before
>>>>>>>>> proceeding.
>>>>>>>>> 
>>>>>>>>> The question that I have is; since Redis is merely a key-value
>> store,
>>>>>>> and
>>>>>>>>> we have a structured event to be persisted, what would the
>> key-value
>>>>> be?
>>>>>>>>> Following are the possible approaches[3];
>>>>>>>>> 
>>>>>>>>> 1. Store the entire object as a JSON-encoded string in a single
>> key.
>>>>>>>>> 
>>>>>>>>> * SET event:{id} '{"sensorId":"001", "temp":28}'*
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - Pro: faster when accessing all the fields of the event at once.
>>>>>>>>> - Pro: works with nested objects (but I don't think we have any
>>>>> nested
>>>>>>>>> objects).
>>>>>>>>> - Pro: can set the TTL.
>>>>>>>>> - Con: slower when accessing a single or subset of fields of the
>>>>>>> event.
>>>>>>>>> - Con: JSON parsing is required to retrieve fields. However, it's
>>>>>>> quite
>>>>>>>>> fast.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2. Store each Object's properties in a Redis hash.
>>>>>>>>> 
>>>>>>>>> * HMSET event:{id} sensorId "001"*
>>>>>>>>> 
>>>>>>>>> * HMSET event:{id} temp "28"*
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - Pro: can set the TTL.
>>>>>>>>> - Pro: no need to parse JSON strings.
>>>>>>>>> - Con: faster when accessing a single or subset of fields of the
>>>>>>> event.
>>>>>>>>> - Con: slower when accessing all the fields of the event.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 3. Store each Object as a JSON string in a Redis hash.
>>>>>>>>> 
>>>>>>>>> * HMSET events {id1} '{"sensorId":"001", "temp":28}'*
>>>>>>>>> 
>>>>>>>>> * HMSET events {id2} '{"sensorId":"002", "temp":32}'*
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - Pro: fewer keys to work with.
>>>>>>>>> - Con: can't set the TTL.
>>>>>>>>> - Con: JSON parsing is required to retrieve fields.
>>>>>>>>> - Con: slower when accessing a single or subset of fields of the
>>>>>>> event.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 4. Store each property of each Object in a dedicated key.
>>>>>>>>> 
>>>>>>>>> * SET event:{id}:sensorId "001"*
>>>>>>>>> 
>>>>>>>>> * SET event:{id}:temp 28*
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - Pro: can set the TTL per field (but it's not necessary for our
>>>>>>>>> scenario).
>>>>>>>>> - Pro: no need to parse JSON strings.
>>>>>>>>> - Con: faster when accessing a single or subset of fields of the
>>>>>>> event.
>>>>>>>>> - Con: slower when accessing all the fields of the event.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 5. Use RedisJSON[4][5] module and store each event as a JSON.
>>>>>>>>> 
>>>>>>>>> * JSON.SET event . '{"sensorId":"001", "temp":28}'*
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - Pro: faster manipulation of JSON documents.
>>>>>>>>> - Pro: faster when accessing single/multiple fields of the event.
>>>>>>>>> - Pro: can set the TTL.
>>>>>>>>> - Con: requires RedisJSON module.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> IMO, 1 & 2 would be the best choices given that they both allow
>> (TTL)
>>>>>>> for
>>>>>>>>> purging. What would you think is best? Your feedback is highly
>>>>>>> appreciated.
>>>>>>>>> 
>>>>>>>>> [1] https://redis.io/
>>>>>>>>> [2] https://issues.apache.org/jira/browse/STREAMPIPES-121
>>>>>>>>> <https://redis.io/>
>>>>>>>>> [3]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
>>>>>>>>> [4] https://redislabs.com/redis-enterprise/redis-json/
>>>>>>>>> [5] https://oss.redislabs.com/redisjson/
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Grainier.

Re: DataSink for Redis

Reply via email to